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In  Memory  of  Professor  Albert  Szent-Gyorgyi 


SIDNEY  FOX 
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In  paying  tribute  to  Albert  Szent-Gyorgyi,  one  can  call  on  a  wealth  of  recorded 
observation.  The  main  details  are  well  known  and  others  are  documented.  Albert  was 
a  regular  participant  at  the  Sanibel  conference;  this  tribute  stems  from  that  spe¬ 
cial  relationship. 

Since  my  days  as  a  graduate  student  in  the  late  1930s  I  recall  Szent-Gyorgyi  as  a 
teacher,  although  not  in  the  classroom  but  through  scientific  literature.  I  know  that 
numerous  others  have  likewise  perceived  Albert  as  one  of  their  most  remembered 
non-classroom  teachers. 

The  extent  of  this  teaching  relationship  was  emphasized  for  me  on  a  global  scale 
while  in  China  for  the  month  of  October.  Visiting  a  number  of  institutes  in  Beijing 
and  Shanghai,  I  found  quotations  in  (only)  two  of  the  entrance  foyers.  One  of  these 
was  translated  for  me  from  the  Chinese  by  my  host.  I  could  neither  translate  it  nor 
transcribe  the  rapid  translation  I  heard.  The  other  was  another  mural  quotation  of 
Szent-Gyorgyi  also,  but  in  English,  “Research  is  to  see  what  everyone  else  has  seen 
and  to  think  what  no  one  else  has  thought.” 

Szent-Gyorgyi ’s  influence  reached  far.  His  warm  human  qualities  are  illustrated  by 
a  personal  quotation  of  what  he  said  to  Mrs.  Fox  and  me  outside  a  meeting  room.  We 
three  spotted  the  lovely  Mrs.  (Marcia)  Szent-Gyorgyi  approaching  in  the  corridor.  Al¬ 
bert  said,  “There  is  the  best  discovery  I  ever  made.” 

Szent-Gyorgyi ’s  contributions  to,  and  suggestions  in,  science  were  marked  by  the 
outstanding  ability  to  reduce  concepts  to  simple  terms.  He  was  unmatched  in  his  abil¬ 
ity  to  place  his  feet  in  one  area  of  science,  and  to  look  elsewhere,  see  what  few  oth¬ 
ers  have  seen,  and  “to  think  what  no  one  else  has  thought.” 

It  is  painful  to  say  goodbye  to  Albert  Szent-Gyorgyi.  The  pain  is  however  alloyed 
with  joy  when  we  recall  that  we  are  all  better  off  for  the  fact  that  this  “man  of  the 
century”  lived  in  our  time. 


INTERNATIONAL  JOURNAL  OF  QUANTUM  CHEMISTRY:  QUANTUM  BIOLOGY  SYMPOSIUM  14,  OOI  (1987) 

©  1987  by  John  Wiley  &  Sons,  Inc.  CCC  0360-8832/87/010001 -01  $04.00 


Albert  Szent-Gyorgyi’s  Impact  on  Theoretical 

Biophysics+ 

jAnos  j.  ladik 

Institute  for  Theoretical  Chemistry  and  Laboratory  of  the  National  Foundation  for  Cancer  Research  at 
the  Friedrich-AIexander-University  Erlangen-Niimberg,  DS520  Erlangen,  Egerlandstr.  3.  West  Germany 


Introduction 

Among  the  many  ideas  with  which  Albert  Szent-Gyorgyi  has  inspired  biophysics 
research,  three  examples  should  be  discussed  here  in  detail. 

In  1941  he  explained  certain  biological  phenomena,  including  that  under  certain 
conditions,  proteins  can  become  conductors  [1].  Later  it  was  found,  for  instance,  that 
if  light  is  absorbed  by  chromophore  A  situated  at  the  end  of  a  polypeptide  chain,  the 
energy  of  the  excited  electron  is  emitted  in  the  form  of  fluoroescence  at  the  other  end 
of  the  chain  (200-300  A  from  the  first)  by  another  chromophore,  B  [2].  (For  further 
biological  examples  which  make  electronic  conduction  in  proteins  probable  see 
Ref.  3).  With  modem  quantum  theory  of  solids  it  was  possible  to  show  that  this  can 
happen  under  two  conditions: 

First,  if  there  are  free  charge  carriers  in  the  proteins  (which  very  probably  can  be 
generated  under  biological  conditions  with  the  help  of  charge  transfer,  cr). 

Second,  despite  the  strong  side  chain  disorder  of  proteins  there  is  the  possibility  of 
non-negligible  conduction  due  to  hopping.  The  conditions  of  the  occurrence  of  such 
hopping  will  be  discussed  in  detail. 

Szent-Gyorgyi  has  predicted  also  that  proteins  can  form  higher  structures  only  if 
they  are  conductors  [4],  Again  with  the  help  of  modem  quantum  theory  it  was  possi¬ 
ble  to  show  that  this  is  the  case,  because  the  van  der  Waals  forces  are  much  larger  in 
the  interacting  conducting  chains  than  between  insulator  chains. 

Szent-Gyorgyi  often  emphasized  that  easy  energy  and  charge  transport  in  DNA  and 
proteins  are  necessary  for  normal  cell  functioning.  He  pointed  out  that  if  the  flow  of 
electric  charges  in  these  macromolecules  is  hindered  this  can  lead  to  a  cancerous  state 
[5].  It  will  be  shown  below  that  to  have  an  oxygen  metabolism  (or  efficient  photosyn¬ 
thesis  in  plants)  it  is  really  necessary  to  ha”e  a  non-negligible  conduction  of  proteins 
which  was  originally  Szent-Gyorgyi’s  main  argument  in  his  early  papers  [1]  in  which 
he  has  assumed  electronic  conduction  in  proteins. 


’in  memorium  of  my  Great  Teacher,  and  Respected  Friend,  October  22,  1986. 
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Hopping  Conduction  in  Disordered  Proteins 

Szent-Gyorgyi’s  suggestion  [t]  that  proteins  can  become  semiconductors  was 
generally  rejected  in  its  time.  The  two  main  counterarguments  were  (1)  that  the 
fundamental  energy  gap  in  proteins  is  too  large,  and  (2)  that,  due  to  the  disorder 
caused  by  the  different  side  chains,  proteins  do  not  have  continuous  regions  of 
allowed  energies  but  only  very  narrow  peaks  in  the  density  of  states  (DOS)  of  allowed 
energy  levels. 

To  answer  the  first  counterargument  one  can  point  out  that  due  to  the  large  proba¬ 
bility  of  charge  transfer  (ct)  in  vivo,  free  charge  carriers  can  be  generated  quite  eas¬ 
ily.  As  Szent-Gyorgyi  has  suggested  [5]  and  we  have  calculated  [6],  there  is  a 
possibility  that  in  compounds  with  two  carbonyl  groups  (like  methylglyoxal)  a  small 
amount  (0.03  e)  of  ct  may  occur  between  methylglyoxal  and  a  peptide  group  [6]  (the 
latter  being  the  electron  donor).  In  vivo  there  seems  to  be  a  much  more  effective  cr 
transfer  mechanism  from  the  negatively  charged  PO*  groups  of  DNA  to  the  posi¬ 
tively  charged  side  groups  of  a  polypeptide  chain  (to  the  guanidium  end  groups  of 
arginine,  to  protonated  lysine  side  chains  or  to  histidine  cations)  in  nucleoproteins.  It 
should  be  noted  that  one  of  the  most  important  nucleoproteins  is  nucleohistone, 
which  is  rich  in  arginine,  and  further,  in  DNA  (as  all  quantum  mechanical  calcula¬ 
tions  show  [7, 8])  there  is  an  internal  ct  from  the  sugar  residue  to  the  phosphate 
group  of  —0.05  e.  In  this  way  the  actual  charge  on  a  phosphate  group  of  a  polynucleo¬ 
tide  is  -1.05  e  instead  of  1.0  e  which  facilitates  strongly  the  postulated  CT  from 
DNA  to  proteins.  Therefore,  the  gap  problem  of  the  conduction  in  proteins  seems  to 
be  eliminated  by  the  described  generation  of  free  charge  carriers. 

Detailed  calculations  in  which  four-component  (glycine,  serine,  asparagine,  and 
cysteine)  nonperiodic  polypeptide  chains  with  random  sequences  were  treated  using 
the  theory  of  disordered  systems  (for  the  applied  so-called  negative  factor  counting 
(nfc)  technique  see  Ref.  9)  showed  that  the  disorder  results  in  a  very  strong  broaden¬ 
ing  of  the  allowed  energy  states  both  in  the  valence  and  conduction  band  regions 
1 10].  This  means  that  if  free  charge  carriers  are  obtained  (especially  in  the  originally 
empty  conduction  band  region  of  the  polypeptide)  in  these  regions,  with  ct  from 
DNA  or  otherwise,  detailed  calculations  have  shown  (10)  (despite  the  localization  of 
the  wave  functions  due  to  the  disorder),  that  a  strong  hopping  conduction  can  be  ex¬ 
pected  in  these  systems.  It  is  well  known  from  linear  algebra  that  for  any  arbitrary 
energy  level  the  corresponding  eigenvector  (wavefunction)  can  be  determined  with 
great  precision  using  the  standard  inverse  iteration  technique  |ll).  In  this  way.  after 
determining  the  individual  energy  levels  in  the  conduction  band  region  of  the  investi¬ 
gated  disordered  (i.e.,  having  a  random  sequence)  four-component  gly,  ser,  asp,  and 
cys  system  in  a  1 : 1 : 1 : 1  composition  using  the  previously  mentioned  negative  factor 
counting  method  [9],  we  were  able  to  determine  the  wavefunctions  belonging  to  these 
systems  {10].  Having  these  wavefunctions  it  was  easy  to  see  that  all  states  in  this  dis¬ 
ordered  chain  are  strongly  localized  on  a  single  amino  acid  residue.  The  detailed  cal¬ 
culations  were  done  for  the  first  15  Tilled  and  the  next  40  unfilled  levels  in  the 
conduction  band  region.  Since  these  calculations  were  performed  on  the  ab  initio 
Hartrec-Fock  level  (all  electrons  and  all  interactions  between  them  were  taken  into 
account),  the  energy  level  spacing  between  consecutive  levels  is  only  0.003  eV 
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(smaller  than  the  thermal  energy  at  300°K,  kB  •  300  =  0.03  eV.  Detailed  consider¬ 
ations  have  shown  that  to  get  one  electron  of  a  chain  of  N  =  300  units  from  one  end 
to  the  other,  the  average  energy  for  hopping  from  a  unit  to  its  first  neighbor  is 
A Ej^j  =  0.25  eV.  One  can  substitute  this  value  in  the  expression  of  the  primary  jump 
rate  [12]  (the  number  of  jumps  per  second  from  unit  A  to  unit  B) 


xf 


(1) 


P^B  =  ‘'phoooo*  A£-'^»r(X  <  Xr  l  XT  >  V, 

Here  is  the  acoustic  phonon  frequency  which  in  polypeptides  corresponds  to 
the  vibrations  of  the  side  chains  relative  to  each  other,  c,  r  and  cjs  are  the  dominant 
coefficients  of  those  lcao  wavefunctions  with  energies  e,  and  e, ,  respectively,  which 
are  localized  on  atoms  A  and  B,  respectively,  and  finally  x  ?  is  an  atomic  oibital  (ao) 
localized  on  the  amino  acid  residue  A.  Taking  ^phonon  =  10’V  (which  is  a  typical 
value  for  acoustic  phonons)  one  obtains  for  PA^B  ~  10s  for  first  neighbors,  109  if  the 
transition  occurs  within  the  same  residue  and  103  for  second  neighbor  jump  rates. 
These  values  are  qualitatively  in  good  agreement  with  PA^B  values  found  in  different 
disordered  solids  which  conduct  through  variable  range  hopping  (for  details  and  ref¬ 
erences  see  Ref.  10). 

These  results  mean  that  if  free  charge  carriers  are  generated  in  vivo  by  cr  from 
DNA  to  protein  or  in  vitro  by  doping  by  electron  donors,  proteins  can  become,  de¬ 
spite  their  aperiodicity,  quite  good  conductors  by  a  hopping  mechanism.  In  this  way 
Szent-Gyorgyi’s  45-year-old  prediction  [1]  could  be  proven  with  the  help  of  modem 
theoretical  solid-state  physical  methods. 


Interaction  Between  Insulator  and  Conductor  Biopolymer  Chains 

During  a  personal  discussion,  Szent-Gyorgyi  theorized  that  “proteins  can  form 
larger  structures”  only  if  they  are  conductors  [4).  This  is  a  good  example  of  his  leg¬ 
endary  intuition.  Of  course  there  is  (and  was)  no  scientist  in  the  world  whose  every 
intuitive  idea  has  been  proven  correct.  But  even  if  only  a  small  fraction  of  such  intu¬ 
itive  ideas  of  one  great  scientist  are  right,  such  a  person  can  be  ranked  as  an  extraor¬ 
dinary  genius.  There  is  no  question  that  Albert  Szent-Gyorgyi  achieved  that  status. 

If  one  looks  into  the  theory  of  intermolecular  (or  interchain)  interactions,  one  finds 
that  besides  the  leading  electrostatic  term,  the  polarization  and  dispersion  forces  are 
major  contributors  to  these  interactions. 

If  one  applies  perturbation  theory,  one  can  see  by  the  second  order  (what  is  also 
true  in  higher  orders),  that  the  dispersion  energy  between  system  A  and  B,  respec¬ 
tively,  has  the  form 


A  EA  +  A  Eb 

Here  C  stands  for  a  fourfold  summation  of  complicated  integrals  and  A E*  and  A £B 
are  excitation  energies  of  systems  A  and  B,  respectively.  If  A  and  B  are  atoms  or 
molecules  A E*  and  A EB  will  give  the  energy  difference  between  their  highest  filled 
and  lowest  unfilled  energy  levels  [see  Fig.  1(a)].  If  many  molecules  A  and  B,  respec¬ 
tively,  are  interacting  forming  the  chains  polyA  and  polyB,  the  corresponding  energy 
levels  broaden  into  so-called  energy  bands  (continuous  regions  of  allowed  energy 
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(a)  (b)  (c) 


levels)  from  which  one  is  completely  filled  and  the  other  one  completely  empty 
[see  Fig.  1(b)],  In  this  case  both  chains  are  insulators.  If,  on  the  other  hand,  due  to 
cr  both  chains  have  only  partially  filled  bands  [see  Fig.  1(c)],  they  become  conduc¬ 
tors.  As  we  were  able  to  show,  together  with  the  late  Kalman  Laki  [13],  in  this  case 
the  dispersion  interaction  (and  in  a  similar  way  also  the  polarization  interaction) 
becomes  many  orders  of  magnitude  larger,  because  one  can  excite  electrons  within 
the  partially  filled  bands  (AEa  — »  O,  A EB  —*  O).  For  chains  with  finite  length 
(which  is  the  realistic  case)  A EA  and  A EB  will  not  be  zero,  but  very  small.  Still  if  the 
denominator  is  very  small  the  expression  (2)  will  become  large,  and  so  a  large 
increase  of  interaction  energy  takes  place.  This  is  also  very  important  from  the  point 
of  view  of  carcinogenesis,  since  cr  caused  by  carcinogens  can  change  the  population 
of  the  valence  or  conduction  bands,  respectively,  of  both  of  DNA  and  proteins  and 
therefore,  change  the  dispersion  and  polarization  forces  between  them.  In  this  way 
they  can  influence  the  DNA-protein  interaction  (which  controls  genetic  regulation) 
even  if  they  are  bound  at  some  distant  point  along  the  DNA  chain  [14]  We  can  con¬ 
clude  that  Szent-Gyorgyi’s  intuition  was  right  in  this  case  also. 

Carcinogenesis 

Szent-Gyorgyi  has  emphasized  many  times  that  for  normal  (noncancerous)  cell 
functioning  one  needs  an  easy  energy  and  charge  transport  [5].  To  understand  his 
idea  one  should  think  first  of  all  on  the  respiratory  cycle  which  is  crucial  for  the  0; 
metabolism  of  the  normal  cell  (cancerous  cells  do  not  show  02  metabolism,  but  in¬ 
stead  of  this  fermentation  takes  place).  One  knows  that  in  this  respiratory  cycle  elec¬ 
trons  have  to  be  transported  from  one  part  to  another  part  of  different  proteins. 
Though  there  are  other  possible  mechanisms  for  electron  transport  in  a  protein  (via 
H-bonds  coupled  to  proton  transport,  via  aromatic  intermediates  between  different 
heme  groups,  etc.),  the  easiest  way  for  electron  (and  energy)  transport  to  occur  in  a 
protein  is  still  via  transport  along  the  main  polypeptide  chain.  In  the  section  on  Hop¬ 
ping  Conduction  in  Disordered  Proteins,  one  could  see  that  despite  the  too  large  gap 
and  disorder  in  proteins,  this  is  quite  possible  through  effective  electron  (or  in  an  ex- 
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cited  state)  exciton  hopping.  Proteins  (and  DNA)  which  show  an  electron  (exciton) 
conduction  in  this  way  also  provide  a  much  quicker  and  effective  possibility  for  rapid 
signal  transfer  in  these  biopolymers,  which  certainly  plays  an  important  role  in  the 
self-regulation  of  the  cell. 
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Abstract 

In  the  study  of  cancer  as  a  complicated  disease  caused  by  many  combinations  of  various  factors,  it  may 
be  of  importance  to  consider  also  any  possible  changes  of  the  water  structure  in  the  environment  of  the 
malignant  cells.  The  occurrence  of  such  changes  has  been  established  experimentally,  for  example,  by  a 
study  of  the  magnetic  properties  of  water  by  nuclear  magnetic  resonance  (nmr)  spectra.  It  has  been  found 
that  the  protons  in  the  water  surrounding  malignant  cells  have  a  much  longer  spin-lattice  relaxation  time 
than  the  protons  in  the  water  around  normal  cells.  This  indicates  that  the  water  molecules  in  tumor  cells  are 
less  structured  and  able  to  move  more  freely  than  in  normal  tissues,  where,  due  to  the  effect  of  hydrogen 
bonding,  water  occurs  mainly  as  five-  or  six-membered  rings.  This  prolongation  of  the  proton  spin-lattice 
time  may  be  an  important  factor  in  cancer,  but  further  studies  are  necessary  before  one  can  decide  with 
certainty  whether  it  would  be  possible  to  use  this  effect  to  diagnose  malignant  transformations  at  an  early 
stage.  It  is  suggested  that  changes  in  the  magnetic  properties  of  water  in  a  malignant  tumor  during 
chemotherapy  and  other  treatments  be  monitored  as  control  tools. 

This  paper  provides  some  remarks  prepared  for  the  panel  discussion  on  “Models  of 
Carcinogenesis”  at  the  1987  Sanibel  Symposium  on  Quantum  Biology  and  Quantum 
Pharmacology,  and  it  deals  briefly  with  the  role  of  the  water  environment  in  cancer. 

According  to  Boyland  [1],  a  rough  estimate  of  the  most  important  sources  causing 
human  cancer  indicates  that  less  than  5%  have  physical  origins  such  as  radiation 
damage,  that  less  than  5%  are  caused  by  external  viruses,  and  that  more  than  90^ 
may  come  from  chemicals  in  our  environment.  More  recent  studies  (2) ,  have  found 
that  some  of  our  living  habits  such  as  tobacco  smoking,  drinking  of  alcohol,  the  com¬ 
binations  of  these  two  factors,  or  excess  consumption  of  fats,  represent  a  very  large 
portion  of  the  causes  in  the  various  forms  of  cancer.  Most  of  these  carcinogenic 
chemicals  enter  the  body  through  breathing,  eating,  and  drinking,  and.  even  though 
many  of  them  come  in  water  solution,  one  would  not  expect  that  water  itself  would 
play  any  major  role  in  the  study  of  cancer. 

Cancer  is  a  disease  characterized  by  the  fact  that  some  cells  in  the  body  start  repli¬ 
cating  in  an  uncontrolled  fashion.  All  experimental  experience  indicates  that  cancer  is 
connected  with  some  form  of  damage  to  the  genetic  code  and  its  regulatory  behavior. 
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namely  to  some  change  in  the  DNA-molecule  as  a  nucleoprotein  in  an  ordinary  body 
cell.  At  the  1986  Sanibel  panel  discussion  on  this  subject,  the  following  models 
of  carcinogenesis  (3]  were  particularly  discussed:  the  immunological  model,  the 
virus  model,  the  somatic  mutation  model,  the  reading-error  model,  and  the  proton¬ 
tunneling  model. 

In  the  study  of  carcinogenesis,  the  presence  of  water  is  often  taken  for  granted  and 
therefore  ignored,  although  water  plays  an  essential  role  in  all  biological  systems.  It 
has  been  found,  however,  that  in  the  environment  of  malignant  cells,  the  system  of 
water  molecules  shows  a  different  structure  than  around  normal  cells.  This  may  be 
seen,  for  instance,  in  the  nuclear  magnetic  resonance  (nmr)  spectra  of  the  water  pro¬ 
tons.  If  one  measures  the  proton  spin  relaxation  time  (7j)  at  100  MHz  in  normal  and 

malignant  human  tissues,  the  relaxation  times  in  tumor  tissues  (T . . 1  are.  on  the 

average,  longer  than  the  relaxation  times  in  normal  tissues  (7j„omul)  (see  Ref.  4). 
This  indicates  that  the  system  of  water  molecules  in  tumor  tissues  are  less  structured 
and  the  molecules  are  able  to  move  more  freely  than  in  normal  tissues. 

In  this  connection,  it  is  important  to  study  the  structure  of  water  in  the  environment 
of  biological  molecules  in  general,  and  in  normal  cells  or  tissues  in  greater  detail  for 
comparison  purposes.  According  to  our  recent  work  [5).  the  environmental  water 
structures  in  solutions  of  biological  molecules  are  investigated  with  particular  atten¬ 
tion  to  the  ring  structures  formed  by  hydrogen  bonding.  It  is  shown  that  the  most 
plausible  forms  of  water  structure  are  short-range  six-membered  connections.  Next, 
according  to  our  improved  water  model  (6, 7),  an  equilibrium  between  six-membered 
structures  and  five-membered  ring  connections  is  established,  and  the  cooperative 
changes  between  their  structures  are  observed. 


Table  I.  The  values  of  A£„„  (kJ/mol) 


Positive  hydration 


Negative  hydration 


Li* 

27.2 

K* 

-3.8 

Na* 

3.3 

Rb* 

-6.3 

Ca2* 

32.2 

Cs* 

-8  8 

Sr2* 

10.5 

Be2* 

- 184  5 

Ba2* 

1.3 

Mg’* 

-8.8 

Sc’* 

51.9 

Al'* 

-313.4 

Y>* 

41.0 

Tl* 

-7.1 

La'* 

23.0 

F 

-18.0 

Ag* 

4.2 

Cl 

-7.5 

Zn2‘ 

50.6 

Br 

-7.5 

Mn2* 

42.3 

1 

-7.9 

Fe2* 

48.5 

Co2* 

48.5 

Ni2* 

51.0 

Cu2* 

49.8 

Fe'* 

51.9 

Hg2* 

10.9 

Pb2* 

5.9 
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In  lowering  the  temperature  from  0°C  to  -40°C  under  supercooling  conditions,  the 
form  of  short-range  six-membered  connections  are  increased  dominantly.  In  this  con¬ 
nection,  it  is  also  interesting  to  observe  the  existence  of  many  medical  and  biological 
reports  about  the  unusual  growth  of  plants  and  animals  in  abnormally  cold  regions, 
where  supercooled  water  is  provided  in  large  amounts,  but  many  more  studies  are 
certainly  necessary  before  any  definite  conclusions  about  this  interesting  phenomenon 
can  be  drawn. 

We  have  also  investigated  the  structural  effect  of  various  ions  in  dilute  aqueous  so¬ 
lutions.  In  particular,  the  quantitative  prediction  of  the  strength  of  the  various  ions  in 
breaking  and  making  structures  can  be  illustrated  by  considering  the  interaction  be¬ 
tween  water  molecules  in  the  solutions  (A £,„)  [8],  In  Table  1,  we  have  listed  the 
values  indicating  the  relative  strength  of  structure-breaking  and  -making  ions.  It 
should  be  added  here  that  research  on  the  properties  of  magnetized  water  under  the 
influence  of  different  magnetic  fields  and  various  flow  conditions  of  ionic  solutions 
are  in  progress.  The  preliminary  results  of  these  experiments  indicate  that  the  struc¬ 
tural  effects  on  water  observed  by  increasing  the  magnetic  field  are  similar  to  the  ef¬ 
fects  shown  by  the  addition  of  structure-making  ions  or  temperature  lowering,  that  is. 
a  favoring  of  the  short-range  six-membered  connections,  and  such  structure  changes 
will  also  influence,  for  instance,  the  surface  tension  under  an  increase  of  applied 
magnetic  fields.  As  an  example,  the  surface  tensions  of  CuCl,  salt  and  Mohr's  salt  (a 
Fe2t  ion  salt)  have  been  measured  at  25°C  by  a  Fisher  Autotensiomat  with  some  re¬ 
sults  shown  in  Figure  1 . 

Finally,  to  make  a  preliminary  test  of  the  assumption  that  there  is  some  connection 
between  malignant  cells  and  the  water  structure  of  their  environment,  we  have  carried 


Figure  I  The  change  of  surface  tension  according  to  magnetic  field  strength  change 
in  0.025  M  Mohr's  salt  solution  in  0.5  N  H,S04  and  pure  water.  <•)  Mohr's  salt:  (A)  pure 

water. 
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out  a  study  of  a  cell  structure  in  vitro,  in  which  the  water  structure  was  changed  by 
the  addition  of  a  structure-making  ion.  In  this  experiment,  cultured  fibroblast  3T331 
cell  lines  are  used,  and  the  material  medium  was  prepared  as  Eagle’s  Modified  Eagle 
Medium  (MEM)  plus  5%  fetal  bovine  serum.  In  Figure  2,  it  is  shown  that  the  addi¬ 
tion  of  25  mM  Ca2+  ion  (one  of  the  strong  structure-making  ions)  obtained  by  an 
electrolytic  process  in  CaCl2  brings  a  remarkable  decrease  of  the  cell  numbers  from 
2  million  down  to  nearly  20,000  in  a  few  days. 

This  decrease  could,  of  course,  also  be  an  effect  of  the  Ca  ions  themselves.  How¬ 
ever,  when  one  added  the  Ca  ions  in  the  form  of  25  mM  CaCl2  solution,  which  con- 
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tains  a  mixture  of  both  structure-making  and  structure-breaking  ions,  one  is  unable  to 
control  the  increased  number  of  the  cell  lines.  This  experiment  was  carried  out  in  col¬ 
laboration  with  Professor  H.  Sequchi,  Kochi  Medical  School,  Japan.  Even  if  this  in 
vitro  experiment  is  not  conclusive  for  what  will  happen  in  a  living  system,  it  indi¬ 
cates  that  it  may  be  worthwhile  to  continue  studies  along  these  lines  and  to  try  to  in¬ 
fluence  the  water  structure  in  malignant  tumors  by  various  means,  to  see  if  such  an 
approach  has  any  influence  on  the  disease. 

The  prolonged  spin-lattice  relaxation  times  of  the  protons  in  the  water  environment 
of  malignant  cells  may  be  an  important  symptom  of  cancer,  and  investigations  as  to 
whether  it  may  be  used  to  diagnose  any  malignant  changes  at  an  early  stage  should 
continue.  It  should  be  remembered,  however,  that  a  similar  effect  may  occur  in  con¬ 
nection  with  other  diseases. 

In  fact,  a  similar  prolongation  seems  to  occur  in  the  water  environment  of  the  beta 
cells  in  diabetes.  It  is  somewhat  surprising  to  learn  that,  on  a  cellular  or  subcellular 
level,  there  are  certain  similarities  between  the  causes  of  canceT  and  the  causes  of 
diabetes.  Among  the  causes  of  diabetes  [9],  the  following  factors  are  considered: 
(1)  virus  that  destroy  the  beta  cells  or  cause  them  to  malfunction,  (2)  autoimmune 
reactions  that  damage  the  beta  cells,  (3)  environmental  chemicals  that  may  destroy 
the  beta  cells  or  cause  them  to  function  improperly,  (4)  hereditary  factors  that  reduce 
the  production  of  insulin  in  pancreatic  beta  cells.  In  addition,  we  have  now  found  a 
characteristic  change  in  the  water  structure  around  malfunctioning  beta  cells  in  dia¬ 
betes,  and  further  studies  may  be  worthwhile  in  this  connection. 

All  the  results  reported  at  this  panel  discussion  are  somewhat  preliminary,  but  fur¬ 
ther  experiments  are  in  progress.  At  this  point,  however,  we  would  like  to  make  the 
definite  suggestion  that  one  should  follow  the  changes  in  the  magnetic  properties  of 
water  in  a  malignant  tumor  during  chemotherapy  and  other  treatments  and  to  use  it  as 
an  extra  control  tool. 

It  should  finally  be  observed  that  the  water  structures  discussed  here  and  their 
changes  are  essentially  due  to  the  hydrogen  bonding  and  its  collective  behavior.  A 
hydrogen  bond,  in  this  instance,  is  a  proton  shared  between  two  electron  pairs,  and 
the  theoretical  study  of  this  simple  system  is  still  one  of  the  most  important  problems 
of  quantum  biology. 
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Abstract 

A  theoretical  exploration  of  the  intercalative  mode  of  binding  with  DNA  of  the  antitumor  drug 
bisantrene  and  the  closely  related  but  inactive  drug  LC  230487  indicates  that  both  drugs  show  a  preference 
for  a  major  groove  intercalation  to  GC  sequences.  In  this  mode  of  binding  bisantrene  is  predicted,  how¬ 
ever,  to  have  a  greater  affinity  for  DNA  than  LC  230487  due  to  a  stronger  and  somewhat  different  network 
of  H-bonding  interaction  with  the  double-stranded  receptor.  While  this  difference  in  the  strength  and  pat¬ 
tern  of  the  intercalative  association  could  be  considered  as  possibly  related  to  the  striking  difference  in  their 
antitumor  activity,  the  experimental  observation  that  both  compounds  exhibit  a  practically  identical  affinity 
constant  for  binding  to  DNA,  confronted  with  the  theoretical  evaluation  of  different  affinities  for  intercala¬ 
tive  association  suggest  that  LC  230487  could  possibly  interact  with  DNA  by  a  different  mechanism  which 
does  not  lead  to  antitumor  activity.  This  could  perhaps  consist  of  an  exterior  binding. 


Introduction 

The  restriction  imposed  on  the  use  of  anthracyclines  as  antitumor  agents  by  their 
dose-dependent  cardiotoxicity  has  prompted  the  design  and  synthesis  of  derivatives  or 
analogues  with  the  aim  of  diminishing  or  abolishing  their  toxic  effect  while  maintain¬ 
ing  or  enhancing  their  chemotherapeutic  activity.  The  origins  of  the  cardiotoxicity  are 
unknown.  Subscribing  to  the  findings  of  an  important  group  of  researchers  headed  by 
Lown  [1-3],  the  cardiotoxic  effects  could  be  due  to  the  engagement  of  the  quinoid 
ring  of  the  anthracyclines  in  a  redox  cycle,  leading  through  intermediate  semi- 
quinones,  to  the  generation  of  reactive  oxygen  species  capable  to  produce  peroxida- 
tive  injury  to  membrane  lipids  and  DNA  lesions.  “Second-generation  anthracyclines” 
have  been  prepared  inspired  by  this  concept,  which  are  characterized  by  more  nega¬ 
tive  reduction  potentials  and  thus  a  lesser  aptitude  to  produce  the  supposedly  harmful 
oxygenated  free  radicals.  Outstanding  among  these  new  compounds  are  mitoxantrone 
[3,4]  and  anthrapyrazole  [5,6].  These  molecules  are  found,  indeed,  to  be  less  car¬ 
diotoxic  than  the  classical  anthracyclines;  it  seems,  however,  that  they  also  have  a 
reduced  potency  and  spectrum  of  antitumor  activity.  This  situation  could  be  due  to 
many  factors,  one  possibility  being  that,  as  shown  by  theoretical  exploration  [7, 8] 
their  mode  of  binding  to  DNA,  although  also  intercalative,  shows  significant  differ¬ 
ences  with  respect  to  that  of  the  classical  anthracyclines.  A  tempting  conclusion  is 
that  the  antitumor  activity  could  well  depend  not  only  on  the  strength  of  binding  of 
a  drug  to  DNA  but  also  on  its  detailed  mode  of  association  [9] . 
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An  interesting  step  along  this  line  of  research  is  the  recent  preparation  and  assay  of 
bisantrene  [1],  a  bisquanylhydrazone  of  anthracene-9,  10-dicarboxylaldehyde  (10). 
The  compound  possesses  a  planar  anthracene  ring  which  makes  it  a  suitable  candidate 
for  intercalation  into  DNA  but  is  devoid  of  the  quinoid  grouping  and  should  thus  not 
be  subject  to  the  redox  cycle.  In  conformity  with  this  situation  it  is  found  that 
bisantrene  intercalates  into  double-stranded  nucleic  acids  (11—13].  It  manifests,  how¬ 
ever,  also  a  second  weaker  type  of  interaction  with  this  biopolymer,  believed  to  con¬ 
sist  of  an  exterior  electrostatic  binding  to  the  anionic  backbone  of  the  DNA  helix 
(13].  Bisantrene  is  said  to  be  a  low  toxicity  agent  [11]  and  to  manifest  “good”  antitu¬ 
mor  activity  against  P-388  leukemia  and  B-16  melanoma  in  mice  [14],  This  drug 
seems  thus  akin  to  the  "second-generation  anthracyclines.” 

Interestingly,  a  close  derivative  of  bisantrene,  designated  as  CL  230487  [11],  al¬ 
though  binding  to  DNA  with  practically  the  same  affinity  constant  as  bisantrene  [13] 
is  inactive  in  the  anticancer  tests  mentioned  (Fig.  1). 

This  situation  could  possibly  be  related  to  the  problem,  outlined  at  the  beginning  of 
this  paper,  of  the  relationship  between  the  mode  of  binding  to  DNA  and  antitumor  ac¬ 
tivity  of  drugs.  It  is  this  question  that  we  propose  to  investigate  in  this  paper  by  try¬ 
ing  to  gain,  through  a  theoretical  exploration,  more  information  about  the  details  of 
interaction  with  DNA  of  I  and  II. 


Figure  1.  Chemical  formula  and  atom  numbering  of  bisantrene  1  and  CL  230487  II.  (The 
dashed  lines  indicate  the  separation  between  constitutive  fragments.) 
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Methodology 

The  binding  mode  was  investigated  with  the  help  of  model  double-stranded  tet- 
ramers  d(GCGC)2  and  d(ATAT)2  in  which  the  intercalation  is  assumed  to  take  place 
in  the  center  of  the  oligomer,  namely,  between  base  pairs  2-3'  and  3-2'  (Fig.  2), 
that  is,  in  a  pyrimidine  (3'-5')  purine  sequence.  This  choice  is  in  keeping  with  the 
conclusions  of  our  previous  works  on  related  compounds  [7-9]  where  it  was  shown 
that  the  isomeric  oligonucleotides  d(CGCG)2  and  d(TATA)2,  with  a  purine  (3'— 5') 
pyrimidine  sequence  at  the  intercalation  site,  were  significantly  disfavored  in  terms 
of  their  binding  affinities.  The  value  of  17°  has  been  adopted  for  the  unwinding  angle 
(Aa),  in  analogy  to  that  of  mitoxantrone,  although  a  somewhat  smaller  value  of  this 
angle  (14°)  is  possible  [15].  The  dihedral  angles  adopted  for  the  tetranucleotide  are 
those  derived  by  Miller  et  al.  for  this  value  of  Aa  in  the  course  of  their  study  of  inter- 
calative  binding  using  the  AGNAS  procedure  [16].  In  line  also  with  this  procedure, 
a  mixed  sugar  pucker  pattern  C3'endo-C2'  endo  was  adopted  at  the  intercalation 
site  [17]. 

Similar  to  our  previous  studies  [7,8,  18-22],  the  interactions  are  computed  be¬ 
tween  rigid  models  of  the  oligonucleotides,  fixed  in  the  above-mentioned  conforma¬ 
tion,  but  allow  a  large  flexibility  to  the  drug,  considered  as  diprotonated  on  N,(NV) 
and  N4(N4.)  of  bisantrene,  or  N2(N2.)  of  CL  230487. 

The  variations  of  the  conformational  energy  of  the  intercalator  upon  complex  for¬ 
mation  are  computed  with  the  SIBFA  (sum  of  interactions  between  fragments  com¬ 
puted  ab  initio)  procedure  [23].  Within  this  methodology,  the  intercalator  is  built  of 
elementary  constitutive  fragments  separated  by  single  bonds  and  the  variation  of  the 
intramolecular  energy  upon  a  conformational  change  is  obtained  as  the  variable  part 
of  the  sum  of  interactions  between  the  fragments  expressed  as: 

S^conf.  =  ^MTP  +  ^pol  "f  £rep  ^disp  ^lor  0) 

The  intermolecular  oligonucleotide-intercalator  interaction  energies  are  computed 
by  the  SIBFA  2  procedure  [24]  as  a  sum  of  five  terms: 
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Figure  2,  The  model  tetramers. 
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A^imer  £  MTP  £pol  ^rcp  ^  disp  +  ^CT  (2) 

In  expressions  (1)  and  (2),  and  denote,  respectively,  the  electrostatic  and 
polarization  contributions,  computed  using  a  multipolar  expansion  of  the  ab  initio  scf 
molecular  wave  functions  of  the  fragments,  and  £rep  and  £djsp  are  the  repulsion  and 
dispersion  contributions,  respectively,  £10f  is  a  torsional  energy  contribution,  cali¬ 
brated  in  [23]  for  elementary  rotations  along  C-C  and  C-O  bonds  and  £CT  is  a  charge- 
transfer  contribution  (see  Refs.  23,24  for  details). 

Drugs  I  and  II  are  built  out  of  constitutive  subfragments,  which  are  individualized 
in  Figure  1  by  dashed  lines.  The  internal  geometry  of  the  fragments  (bond  length  and 
valence  angles)  is  taken  to  be  the  same  as  in  the  crystal  structure  of  related  com¬ 
pounds.  The  conformational  energies  of  the  intercalator,  either  free  or  in  the  com¬ 
plex,  were  minimized  as  a  function  of  all  its  dihedral  angles. 

The  DNA  tetramers  are  built  of  elementary  fragments,  following  a  procedure  de¬ 
veloped  in  our  laboratory  for  the  computation  of  the  molecular  electrostatic  potential 
of  large  biomolecules  (25).  The  fragments  are  the  nucleic  acid  bases,  the  phosphodi- 
ester  linkage  and  deoxyribose,  with  the  same  internal  geometry  as  in  standard  IS¬ 
DN  A  [26], 

The  ab  initio  SCF  computations  on  the  constitutive  fragments  necessary  for  the  ap¬ 
plication  of  the  SIBFA  procedure  [23, 24]  are  performed  using  our  usual  basis  set 
with  a  dzeta  exponent  of  1.5  on  the  ammonium  hydrogen  [27]. 

The  search  for  the  optimal  configurations  for  the  oligonucleotide-intercalator 
complex  are  performed  by  energy  minimization  [28]  of  the  sum  of  A £,„,„  +  S£tonf  + 

8£ _ _  the  last  term  measuring  the  energy  expense  necessary  to  unstack  the  base 

pairs  at  the  intercalation  site,  computed  also  by  the  SIBFA  procedure.  The  variables 
involved  in  the  minimization  are  the  conformational  angles  of  the  intercalator  together 
with  the  intermolecular  variables  defining  the  relative  orientation  of  the  intercalator 
with  respect  to  the  tetramer. 

Both  major  and  minor  groove  intercalative  binding,  corresponding  to  the  pos¬ 
sible  location  of  the  side  chains  in  both  grooves,  was  explored  for  the  two  sequences 
studied. 


Results  and  Discussions 

The  results  of  the  computations  are  reported  in  Tables  I  and  II.  Table  I  lists  the 
values  of  the  intermolecular  interaction  energy  A £,„,„  and  of  its  components,  the 
value  of  the  conformational  energy  variations  8£conf  of  I  and  II  with  respect  to  their 
intrinsically  preferred  conformations  taken  as  energy  zero’s,  the  energy  of  unstacking 
of  die  oligonucleotides  8£myack,  the  resulting  energy  balance  8£  =  A£,mcr  -  8£conf  - 

8 £. _ ...  and  the  energy  balance  difference,  8,  with  respect  to  the  most  stable  value 

taken  as  energy  zero.  Table  II  reports  the  most  significant  interatomic  distances 
between  H  atoms  of  I  or  II  and  binding  sites  in  the  tetramers.  In  Figure  3  is  given  a 
representation  of  the  optimized  structures  of  the  intercalator  with  d(GCGC),  and 
d(ATAT)2. 

In  the  optimized  conformations  of  the  drug-tetranucleotide  complexes,  the  plane  of 
the  chromophore  is  parallel  to  the  plane  of  the  base  pairs  at  the  intercalation  site  with 
its  long  axis  perpendicular  to  the  long  axis  of  these  pairs.  The  two  side  chains  of  the 
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Figure  3.  Computer  drawn  optimized  structures  of  bisantrene  with  d(GCGC)2  and 
<KATAT)j. 


drugs  are  located  in  the  same  groove  of  the  oligonucleotides,  stretching  symmetri¬ 
cally  up  and  down,  toward  the  two  backbone  strands  of  DNA.  We  have  investigated 
also  conformations  of  the  complexes  with  the  two  side  chains  of  the  drugs  in  different 
grooves,  and  with  a  different  orientation  of  the  chromophore.  All  these  conforma¬ 
tions  correspond  to  much  weaker  intermolecular  interaction  energies. 

The  results  listed  in  Table  I  indicate  that: 

1.  For  both  drugs  the  intermolecular  interaction  energy,  A £jmcr,  and  the  overall  en¬ 
ergy  balance  of  the  interaction,  8£,  show  a  significant  sequence  specificity  favoring 
the  tetramer  d(GCGC)2  over  d(ATAT)2.  The  preference  is,  in  terms  of  8£, 
11.5  kcal/mol  and  14.2  kcal/mol  for  I  and  II,  respectively. 

2.  For  both  drugs  and  both  tetramers,  major  groove  binding  is  obviously  preferred 
over  minor  groove  binding. 

3.  For  both  d(GCGC)2  and  d(ATAT)2  sequences  and  both  major  and  minor  groove 
bindings,  the  active  bisantrene  shows  a  greater  affinity  for  intercalative  binding  than 
does  the  inactive  LC  230487.  The  preferences  for  major  groove  binding,  favoring  I 
over  II,  are  43.9  and  46.6  kcal/mol,  in  the  cases  of  binding  to  d(GCGC)2  and  to 
d(ATAT)2  sequences,  respectively.  For  minor  groove  binding,  these  preferences  are 
16.8  and  14.8  kcal/mol,  still  favoring  I. 

4.  Altogether,  the  order  of  the  overall  energy  balance,  8£,  for  the  intercalative 
binding  is,  for  both  compounds  I  and  II: 

d(GCGC)2  >  d(ATAT)2  >  d(ATAT)2  >  (GCGC)2 
major  groove  major  groove  minor  groove  minor  groove, 
the  strongest  association  occuring  with  the  d(GCGC)2  sequence  and  with  two  side 
chains  of  the  drugs  in  the  major  groove  of  the  tetranucleotide. 

Up  to  this  point  of  the  analysis  of  the  results  of  the  intercalative  binding  the  differ¬ 
ence  between  the  antitumorally  active  bisantrene  and  the  inactive  derivative 
LC  230487,  expressed  in  energy  terms,  appears  essentially  quantitative,  the  most  ef- 
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ficient  interaction  characterizing  the  active  compound,  It  is  obvious,  however,  that 
this  situation  must  correspond  also  to  structural  differences  in  the  detailed  mode  of 
binding.  In  order  to  elucidate  this  aspect  of  the  situation  we  may  look  into  the  details 
of  the  intermolecular  interactions  established  between  the  drugs  and  the  receptor 
oligonucleotides  (Table  II). 

The  two  compounds,  I  and  II,  are  diprotonated  on  their  side  chains.  The  results  of 
ab  initio  computations  show  that  the  positive  charge  is  not  localized  exclusively  on 
the  nitrogen  of  the  ammonium  group,  but  is  partially  delocalized  on  the  adjacent 
atoms.  For  compound  I,  these  partially  charged  H  atoms  are  those  of  N2(N2), 
N3(Nj  ),  and  N4(N4  ).  For  II,  in  addition  to  the  strongly  positively  charged  H  atom 
of  N2(N2),  H  atoms  of  the  methyl  groups  linked  to  N3  and  N4  are  also  partially 
charged,  although  significantly  weaker.  These  charged  hydrogen  atoms  of  the  side 
chains  play  an  essential  role  in  interactions  with  the  oligonucleotides,  which  stabilize 
the  complexes. 

We  may  consider  from  that  point  of  view  the  differences  in  these  interactions  which 
produce  a  preference  of  both  compounds  for  (1)  major  groove  intercalation,  (2)  inter¬ 
action  to  GC  sequences,  and  (3)  which  are  responsible  for  better  intercalative  interac¬ 
tion  of  bisantrene  over  LC  230487. 

Major  Groove  Versus  Minor  Groove  Binding 

The  two  side  chains  of  each  drug  behaving  in  a  symmetrical  fashion,  we  may  limit 
our  attention  to  the  behavior  of  one  of  them  in  each  case. 

Major  Groove  Binding 

Side  chain  A  of  I  forms  two  hydrogen  bonds:  one  between  H  of  N3  and  the  ionic 
oxygen  O,  of  the  central  phosphate  group  P2(dH_0  =  1.77  -  1.80  A),  and  another 
between  H  of  N2  and  the  same  O,  of  P2(d„_0  =  2.50  -  2.53  A).  Some  interactions 
between  compound  I  and  base  pairs  are  also  observed:  between  H  of  N2  and  N,  of  the 
purine  G3  or  A3  of  the  intercalation  site,  between  H  of  N„  and  N7  of  G3,  and  between 
N,  and  H  of  the  amino  group  of  A3.  Due  to  longer  distances  between  the  correspond¬ 
ing  atoms  (dH-N?  or  dN_H  =  3.08  -  3.48  A),  these  interactions  are  weaker. 

Compound  II  interacts  with  the  tetramers  in  a  less  efficient  way  than  I.  In  particu¬ 
lar,  only  one  hydrogen  bond  involving  the  phosphate  group  is  found  for  each  side 
chain:  it  occurs  between  an  H  of  C3  and  the  ionic  oxygen  O,  of  the  phosphate  group 
P2(dH_0  =  1.68  ~  1.72  A).  The  distances  between  H  of  N2,  a  strongly  charged 
atom,  and  the  ionic  oxygen  of  P2  are  too  long  (dH_0  =  3.2  -  3.4  A)  to  form  a  nor¬ 
mal  hydrogen  bond.  For  d(GCGC)2,  two  additional  bonds  are  established:  the  first  in¬ 
volves  an  H  of  C4  and  N7  of  G3  of  the  intercalation  site  (dH_N7  =  2.80  A):  the  second 
an  H  of  C4  and  06  of  G3  (d,,^  =  2.64  A).  For  d(ATAT)2,  due  to  the  repulsion  of 
the  amino  group  of  adenine,  H  atoms  of  C4  of  the  side  chain  cannot  come  close 
enough  to  N7  of  the  base  for  an  effective  bond  to  be  formed. 

Minor  Groove  Binding 

For  both  I  and  II,  each  side  chain  forms  two  hydrogen  bonds  with  electron-rich 
sites  in  this  groove.  For  I,  they  occur  between  the  H  of  N,  and  O, .  of  the  deoxyribose 
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linked  to  bases  G3  or  A3  (dH_0  =  2.16  A),  and  in  an  elongated  fashion  between  H  of 
Nj  and  03 ,  of  deoxyribose  linked  to  bases  C2  or  T2  (dH_0  =  2.96  A).  For  11,  the 
distances  between  the  corresponding  atoms  H(C3)  and  H(N2)  with  03 ,  (S-C2  or 
S-T2)  and  O, ,  (S-G3  or  S-G3)  are  2.17  A  and  2.41  A,  respectively. 

For  both  I  and  11,  the  binding  energies  in  the  minor  groove  are  considerably  disfa¬ 
vored  with  respect  to  those  in  the  major  groove. 

Binding  to  GC  Versus  AT  Sequences 

Considering  the  most  efficient  bindings  of  both  drugs  with  these  sequences  which 
occur  through  the  major  groove,  we  observe  that  the  preference  for  binding  to  the  GC 
oligomer  springs  both  from  a  better  interaction  of  the  side  chains  with  the  phosphates 
of  that  sequence  and  their  parallel  better  interaction  with  the  nucleic  acid  bases.  This 
last  result  is  due  essentially  to  the  greater  attraction  exerted  by  N7  of  guanine  over 
N7  of  adenine  for  electrophilic  reactants  due  to  the  greater  value  of  the  molecular 
electrostatic  potential  at  the  former  position  [291. 

Binding  of  /  Versus  Binding  of  ll 

The  less  efficient  intercalative  binding  of  II  as  compared  to  I,  considered  with  the 
best  receptor,  d(GCGC)2,  is  accountable  for  by  the  less  intimate  interaction  of  the 
former  both  with  the  ribose  phosphate  backbone  and  with  the  bases  of  the  oligonucle¬ 
otide.  It  may  be  worth  stressing  that  while  the  principal  hydrogen  bonds  of  I  involve 
NH  donating  groups,  those  of  II  involve  only  CH  donating  groups,  which  although 
activated  by  the  delocalization  of  the  positive  charge  upon  them  do  not  produce  as  ef¬ 
fective  hydrogen  bonds.  Altogether  the  network  of  H  bonds  established  by  II  is  sig¬ 
nificantly  restricted  with  respect  to  that  of  I.  In  particular,  no  strong  H  bond  could  be 
found  for  II,  which  would  involve  its  N2  (or  N2  )  hydrogen. 

Conclusion 

The  present  results  indicate  that  although  both  compounds  bisantrene  and  LC  230487 
are  capable  of  intercalative  binding  to  double-stranded  oligonucleotides,  with  the  same 
sequence  specificity,  they  differ  in  the  detailed  pattern  of  the  complexation  to  the 
point  of  producing,  for  this  type  of  binding,  a  significantly  greater  energy  of  complexa¬ 
tion  for  bisantrene. 

On  the  basis  of  this  result  it  could  be  tempting  to  assume  that  the  existence  of  anti¬ 
tumor  activity  in  I  and  its  absence  in  II,  may  be  related  to  this  difference  in  the  pat¬ 
tern  and  strength  of  intercalative  binding.  That  the  story  may,  however,  be  more 
complex  or  even  substantially  different  is  evident  if  we  recall  the  experimental  result, 
quoted  in  the  introduction  to  this  paper,  which  indicates  nearly  identical  measured 
affinity  constants  for  the  interaction  with  DNA  of  LC  230487  and  of  bisantrene.  In 
view  of  our  theoretical  result,  predicting  a  substantial  difference  in  the  affinity  of 
these  two  compounds  for  the  intercalative  binding,  a  plausible  suggestion  seems  to  be 
that  this  situation  may  signify  a  possible  involvement  of  LC  230487  in  a  mode  of 
binding  other  than  intercalation,  for  example  an  exterior  association  with  the  DNA 
backbone  which  does  not  confer  antitumor  activity.  In  these  circumstances  the  pres¬ 
ence  of  this  activity  in  bisantrene,  and  its  absence  in  LC  230487  could  be  attributed 
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to  differences  in  binding  mode,  the  presence  of  the  activity  being  dependent  on  the 
presence  or  dominance  of  the  intercalative  interaction. 

The  present  case  would  not  constitute  the  unique  example  of  closely  related  com¬ 
pound,  showing  a  similar  affinity  constant  for  DNA  but  a  very  different  behavior  as 
antitumor  agents.  An  outstanding  example  of  a  similar  situation  is  offered  by  m- 
AMSA,  a  potent  chemotherapeutic  drug  and  o-AMSA,  inactive  in  this  respect  (30], 
as  well  as  by  other  compounds  of  the  acridine  drugs  series  (15, 31  ].  We  have  under¬ 
taken  in  our  laboratory  an  exploration  of  the  possible  competitive  exterior  binding  of 
a  number  of  essentially  intercalative  drugs. 
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Abstract 

Basis  set  and  correlation  effects  on  computed  hydrogen  bond  energies  of  the  negative  ion  complexes 
AH,  •  AH,-r’,  for  AH,  *  NH,.  OH,,  and  FH,  have  been  evaluated.  The  addition  of  diffuse  functions  on 
nonhydrogen  atoms  to  valence  double-  and  triple-split  plus  polarization  basis  sets  I6-31G(d,p)  and  6- 
3llG(d,p)|  significantly  decreases  binding  energies  by  9-19  kcal/mol.  depending  on  the  particular  com¬ 
plex  and  the  level  of  theory.  Adding  diffuse  functions  to  hydrogens  has  a  negligible  effect,  while  replacing 
the  single  set  of  polarization  functions  on  each  atom  by  two  sets  alters  energies  by  1  kcal/mol  or  less. 
Electron  correlation  increases  the  hydrogen  bond  energies  of  these  complexes  and  has  a  greater  effect  for 
basis  sets  without  diffuse  functions.  Since  the  Hartee-Fock  energies  computed  with  these  basis  sets  are  al¬ 
ready  too  large,  correlation  energy  calculations  should  not  be  performed  in  these  cases.  For  basis  sets  in¬ 
cluding  diffuse  functions,  the  correlation  energy  contribution  to  the  binding  energies  of  these  complexes  is 
significant,  with  the  Mpller-Plesset  second-order  term  being  the  largest  term  and  having  a  stabilizing  effect 
of  from  3-6  kcal/mol.  The  third  and  foutth  order  terms  are  smaller,  and  may  be  of  opposite  sign.  As  a 
result,  the  MP2  and  MP4  energies  differ  by  no  more  than  1  kcal/mol,  with  the  MP2  stabilization  energy 
being  greater  except  for  NjH,'1.  The  computed  standard  solvation  enthalpy  of  OH"1  by  HjO  based  on  ei¬ 
ther  MP4/6-311+G(2d,  2p)  or  MP2/6-3I  +G(d,p)  electronic  energies  is  -26.8  kcal/mol,  in  excellent 
agreement  with  a  recent  gas-phase  experimental  measurement. 

Introduction 

Many  studies  have  demonstrated  that  calculated  ab  initio  association  energies  are 
dependent  on  the  basis  set  and  the  theoretical  method  used  for  the  calculations  [I- 
28],  The  specific  details  of  these  dependencies  vary  with  the  particular  type  of  inter¬ 
action.  In  an  attempt  to  characterize  these  dependencies,  systematic  studies  have  been 
undertaken  in  this  laboratory  of  the  effects  on  binding  energies  of  augmenting  split- 
valence  plus  polarization  basis  sets,  and  of  electron  correlation  at  various  levels  of 
Mpller-Plesset  perturbation  theory.  The  first  studies  were  concerned  with  the  effects 
of  basis  set  and  electron  correlation  on  computed  protonation  and  lithium  ion  associa¬ 
tion  energies  for  a  series  of  oxygen  and  nitrogen  bases  [14].  Subsequent  studies  ex¬ 
amined  basis  set  and  correlation  effects  on  the  computed  binding  energies  of  neutral 
and  positive  ion  hydrogen-bonded  complexes  (AH,)2  and  AH„  *  AH„+,  +  l  ,  for 
AH,  =  NH3,  OHj,  and  FH  [27,28].  The  present  study  is  an  analogous  investigation 
of  basis  set  and  correlation  effects  on  the  corresponding  negative  ion  complexes 
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AH„  •  AH„_|~'  (alternatively,  A2Hi,_,“l)  for  the  same  hydrides.  Previous  studies  of 
the  structures  and  energies  of  such  negative  ion  complexes  at  various  levels  of 
theoretical  treatment  have  been  reported  [29-40],  The  performance  of  a  given  level 
of  theory  will  be  judged  by  comparison  with  higher  levels  of  theory,  and  for  the  case 
of  02H3~‘,  by  comparison  with  a  recent  experimental  measurement. 

Method 

The  standard  enthalpy  at  298  K  for  the  reaction  involving  the  solvation  of  the  an¬ 
ion  by  the  corresponding  solvent  is  defined  as  AW 2,8  for  the  gas-phase  reaction 

AH„  +  AH._f'  — *  AH„  •  AH„.r'  (D 

The  electronic  contribution  to  this  enthalpy  is  defined  as  A £t,  for  the  same  reaction. 
In  this  study,  electronic  energies  have  been  calculated  using  Hartree-Fock  (hf)  the¬ 
ory,  with  correlation  included  using  Mpller-Plesset  perturbation  theory  at  second 
(mp2),  third  (mp3),  and  fourth  order  (mp4)  [41,42],  The  correlation  energy  calcula¬ 
tions  exclude  the  Is  electrons  on  nonhydrogen  atoms. 

The  electronic  energies  have  been  calculated  using  basis  sets  derived  from  a  split- 
valence  plus  polarization  basis  [6-31  G(d,p)\  [43,44],  and  a  valence  triple-split  plus 
polarization  basis  [6-311  G(d,p)\  [45].  These  basis  sets  were  augmented  with  diffuse 
functions  on  nonhydrogen  atoms,  giving  6-31  +G(d,p)  and  6-311  +G(d,p)  [24,25], 
and  diffuse  functions  on  all  atoms  [6-31  +  +G(d,p)  and  6-31 1-F  +G(d,p)\.  Another 
enhancement  was  the  replacement  of  the  standard  set  of  d  functions  on  nonhydrogen 
atoms  by  two  sets,  with  exponents  a  factor  of  2  larger  and  smaller  than  the  standard 
value  [6-3lG(2d,p)  and  6-3llG(2d,p)],  and  a  similar  change  in  the  first  polarization 
space  of  hydrogen  atoms  [6-3\G(2d,2p)  and  6-31  \G(2d,  2p)].  Basis  sets  derived 
from  combinations  of  these  enhancements  were  also  employed.  No  counterpoise  esti¬ 
mates  of  basis  set  deficiencies  have  been  made  in  this  work.  Recent  studies  have 
suggested  that  the  counterpoise  correction  does  not  improve  reliability,  and  probably 
should  not  be  done  [19, 22].  It  was  noted  that  in  addition  to  those  basis  set  superposi¬ 
tion  effects  for  which  the  counterpoise  correction  is  intended  to  compensate  (or  over¬ 
compensate),  there  are  more  fundamental  deficiences  in  double-polarized  basis  sets 
which  have  important  effects  on  the  predicted  energetics  of  hydrogen-bonded 
systems.  The  intent  in  the  present  study  is  to  evaluate  the  performance  of 
"uncorrected"  basis  sets.  It  is  expected  that  superposition  errors  should  be  small  for 
the  larger  basis  sets  used.  The  good  agreement  between  the  computed  and  the  experi¬ 
mental  enthalpy  for  reaction  (1)  for  02H3 1  supports  this  c  xpectation. 

All  calculations  reported  in  this  work  were  carried  out  on  the  optimized  Hartree- 
Fock  6-31  G(d)  geometries  of  the  monomers  and  complexes.  For  F2H_I  and  N2H5~', 
calculations  were  performed  on  the  equilibrium  structures  of  D„h  and  C,  symmetry, 
respectively.  For  OjHJ1,  both  the  Hartree-Fock  equilibrium  C,  structure  and  a  struc¬ 
ture  of  C2  symmetry  haVe  been  investigated.  The  latter  structure  has  a  symmetrically 
bound  proton,  and  is  energetically  similar  to  the  C,  structure  at  correlated  levels  of 
theory.  Since  the  structures  of  hydrogen  bonded  complexes  also  show  some  depen¬ 
dence  on  the  theoretical  method  [22],  Hartree-Fock  structures  for  all  complexes 
AH„  •  AH~_!|  have  also  been  obtained  with  the  6-31  +G(d)  basis  set.  Although  some 
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structural  differences  are  found,  these  do  not  appear  to  have  large  effects  on  the  com¬ 
puted  stabilization  energies.  Thus,  for  reaction  (1),  corresponding  MP2/6-3 \+G(d,p) 
energies  computed  at  HF/6-31G(d)  and  hf/6-31  +G(d)  geometries  differ  by  0.1,  0.3, 
and  1.1  kcal/mol  for  F2H“‘,  02H3'\  and  N2H,’,  respectively. 

Results  and  Discussion 

Computed  electronic  negative  ion  hydrogen  bond  energies  at  various  levels  of 
theory  are  reported  in  Table  I.  Relationships  derived  from  the  data  of  Table  1  are 
summarized  below.  The  First  seven  relationships  refer  to  basis  set  effects,  the  last  two 
to  correlation  effects. 


Table  I.  Computed  electronic  negative  ion  hydrogen  bond  energies  (kcal/mol)* 


Basis  set 

HF 

MP2 

MP3 

MP4SDQk 

MP4 

6-31  G(d.p) 

-57.2 

FjH"' 

-64.1 

-62.4 

-62.6 

-63.4 

6-31  +  G(d,p) 

-42.5 

-45.2 

-45.5 

-44.4 

-44.4 

6-31  +  +G(d,p ) 

-42.5 

-45.2 

-45.5 

-44.4 

-44.5 

6-MG(2J,p) 

-56.4 

-63.4 

-61.4 

-61.6 

-62.7 

6-31G(2d,2p) 

-56.4 

-63.8 

-62.0 

-62.2 

-63.3 

6-31  +G(2d,  2p) 

-41.3 

-44.0 

-44.4 

-43.2 

-43.3 

6-31  IGW.p) 

—51.6 

-58.1 

-56.6 

-56.8 

-57.7 

6-311 +GW.p) 

-41.6 

-44.3 

-44.8 

-43.6 

-43.6 

6-31 1  +  +G(d,p) 

-41.6 

-44.3 

-44.8 

-43.6 

-43.6 

6-311GC2 d,p) 

-52.5 

-59.5 

-57.9 

-58.0 

-59.1 

6-3UG(2d.2p) 

-52.1 

-59.0 

-57.2 

-57.4 

-58.6 

6-311 +G(2d.2p) 

-41.5 

-44.8 

-45.1 

-44.0 

-44.3 

6-31G(d.p) 

-35.1 

0,Hj  '  - 
-42.2 

c, 

-40.1 

-40.4 

-41.3 

6-31  +G(d.p) 

-24.1 

-27.7 

-27.6 

-26.8 

-27.2 

6-31  +  +G(d,p) 

-24.2 

-27.9 

-27.7 

-27.0 

-27.5 

6-31  G(2d.p) 

-34.5 

-42.0 

-39.8 

-40.0 

-41.2 

6-31G(2d,  2 p) 

-34.6 

-42.8 

-40.6 

-40.9 

-42.1 

6-3l+G(2d,  2p) 

-23.4 

-27.5 

-27.4 

-26.6 

-27.3 

6-311  G(d.p) 

-34.6 

-42.8 

-40.6 

-41.0 

-42.2 

6-311  +GW,p) 

-23.6 

-27.7 

-27.8 

-26.9 

-27.4 

6-311  +  +G(d,p) 

-23.7 

-27.9 

-27.9 

-27.1 

-27.6 

6-311  G(ld,p) 

-34.7 

-43.0 

-40.6 

-40.8 

-42.3 

t>-3UG(2d.  2 p) 

-34.3 

-42.8 

-40.4 

-40.7 

-42.3 

6-31I+G(2</,  2p) 

-23.3 

-27.8 

-27.8 

-27.0 

-27.7 

6-31G(d.p) 

-34.5 

0,H,  1  - 
-43.3 

C, 

-40.8 

-41.1 

-42.1 

6-31  +G(d,p) 

-22.6 

-27.9 

-27.4 

-26.6 

-27.1 

6-31  +  +G(d,p) 

-22.6 

-28.1 

-27.6 

-26.7 

-27.3 

6-3IG(2r/.p) 

-33.7 

-43.0 

-40.2 

-40  4 

-41.8 

6-3IG(2d,2p) 

-33.7 

-43.8 

-41.1 

-41.3 

-42.8 

6-31  +G(2d,  2p) 

-21.8 

-27.6 

-27.1 

-26.2 

-27.1 
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Table  I.  (Continued) 


Basis  set 

HF 

MP2 

MP3 

MP4SDQb 

MP4 

6-311  G(d,p) 

-33.5 

-43.7 

-41.1 

-41.5 

-43.0 

6-311  +G(d,p) 

-22.0 

-28.0 

-27.8 

-26.8 

-27.5 

6-311  +  +G(d,p) 

-22.0 

-28.2 

-27.9 

-26.9 

-27.6 

6-3llG(2d,p) 

-33.8 

-44.1 

-41.3 

-41.5 

-43.2 

6-mG(2d,2p) 

-33.2 

-43.4 

-40.5 

-40.7 

-42.6 

6-3ll+G(2d,2p) 

-21.7 

-27.9 

-27.5 

-26.5 

-27.5 

n2h5-|c 

6-31  G(d,p) 

-19.4 

-24.5 

-23.2 

-23.3 

-24.0 

6-31  +G(d.p) 

-10.8 

-13.7 

-13.5 

-13.1 

- 13.6 

6-31  G(2d,p) 

-18.8 

-24.4 

-23.0 

-22.9 

-23.8 

6-3\G(2d, 2 p) 

— 19.2 

-25.4 

-23.9 

-23.9 

-24.9 

6-3l+G(2d,2p) 

-9.9 

-13.4 

-13.2 

-12.9 

-13.5 

6-311  G(d,p) 

-19.6 

-25.6 

-24.1 

-24.2 

-25.2 

6-311  +G(d.p) 

-10.3 

-13.7 

-13.5 

—  13.1 

-13.7 

6-3UG(2d,p) 

-19.4 

-25.8 

-24.1 

—  24. 1 

-25.2 

6-3UG(2d,2p) 

-19.6 

-26.7 

-24.9 

-25.0 

-26.3 

6-311 +G(2d,2p) 

-9.7 

-13.5 

-13.4 

- 13.0 

-13.7 

'Total  energies  are  available  from  the  author  on  request. 

bFourth  order  Moller-Plesset  energy  including  all  single,  double,  and  quadruple  substitutions,  but  omit¬ 
ting  triple  substitutions. 

‘Calculations  employing  basis  sets  with  diffuse  functions  on  hydrogens  were  not  carried  out  for 
this  complex. 

Basis  Set  Effects 

Changing  from  the  Split- Valence  (t-5\G(d,p)  Basis  Set  to  the  Valence  Triple- 
Split  6-3llG(d,p)  Basis.  Expanding  from  a  valence  double-split  to  a  valence  triple¬ 
split  basis  set  has  a  relatively  small  effect  on  the  stabilization  energies  of  the 
complexes  02H3~'  and  N2H/',  which  change  by  only  1  kcal/mol  or  less  at  Hartree- 
Fock  and  correlated  levels  of  theory.  In  contrast,  this  same  enhancement  decreases 
the  hydrogen  bond  energy  of  F2H_I  by  5-6  kcal/mol.  The  large  effect  in  this  case 
may  be  attributed  to  the  limitations  of  the  double-split  valence  basis  set  for  describing 
the  F  atom. 

Adding  Diffuse  Functions  on  Nonhydrogen  Atoms.  The  addition  of  diffuse 
functions  on  nonhydrogen  (heavy)  atoms  is  once  again  found  to  be  the  single  most 
important  enhancement  of  valence  double-  and  triple-split  basis  sets  for  computing 
hydrogen  bond  energies.  This  enhancement  dramatically  decreases  the  binding  ener¬ 
gies  of  all  complexes  at  all  levels  of  theory,  with  a  larger  effect  observed  with  corre¬ 
lation  than  at  Hartree-Fock.  With  the  double-split  basis  set.  the  energy  lowering  ranges 
from  8.6  kcal/mol  for  N2H5“'  at  Hartree-Fock  to  19.0  kcal/mol  for  F2H~'  at  MP4. 
With  the  triple-split  basis  set,  the  lowering  ranges  from  9.3  kcal/mol  at  Hartrec-Fock 
for  N2H/ 1  to  15.7  kcal/mol  at  MP2  for  the  C2  form  of  02H3~\  At  all  levels  of  theory 
the  energy  lowering  is  greater  for  the  more  symmetric  C2  form  of  02H, 1  than  for  the 
C,  form.  The  decrease  in  the  stabilization  energies  of  these  negative  ion  hydrogen- 
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bonded  complexes  is  greater  than  that  observed  in  the  corresponding  neutral  and  pos¬ 
itive  ion  hydrogen-bonded  complexes.  The  importance  of  diffuse  functions  for 
describing  interactions  involving  negative  ions  has  been  noted  previously  by  others, 
and  is  attributed  to  the  ability  of  diffuse  functions  to  better  describe  the  tails  of  func¬ 
tions  associated  with  the  lone  pairs  of  electrons  on  atoms  which  are  the  basic  sites  for 
bond  formation  [13,22,24-26], 

Adding  Diffuse  Functions  on  Hydrogens.  Addition  of  diffuse  functions  to  hydro¬ 
gen  atoms  has  little  effect  on  the  stabilization  energies  of  these  complexes.  The  pres¬ 
ence  of  such  functions  increases  stabilities  but  by  no  more  than  0.3  kcal/mol  at  any 
level  of  theory. 

Since  the  6-31 +G(d,p)  and  6-311G(d,p)  basis  sets  are  the  same  size,  it  is  appro¬ 
priate  to  compare  the  stabilization  energies  computed  with  each.  Corresponding 
Hartree-Fock  and  Mpller-Plesset  hydrogen  bond  energies  obtained  with  the 
6-31  +G(d,p)  basis  set  are  lower  by  9-16  kcal/mol  than  those  computed  with  the 
6-311  G(d,p)  basis,  and  are  in  better  agreement  with  the  energies  computed  with  the 
largest  basis  set  employed  in  this  study.  Since  the  presence  of  diffuse  functions  on 
hydrogen  atoms  has  essentially  no  effect  on  stabilization  energies,  these  data  demon¬ 
strate  that  it  is  important  to  enhance  the  6-31  G(d,p)  basis  set  by  adding  diffuse  func¬ 
tions  on  nonhydrogen  atoms  rather  than  triple-split  the  valence  shell  when  computing 
the  stabilization  energies  of  negative  ion  hydrogen-bonded  complexes.  The  large  de¬ 
crease  in  stabilization  energies  due  to  the  presence  of  diffuse  functions  argues  for 
their  absolute  necessity  for  describing  interactions  involving  negatively  charged  ions. 

Splitting  d  Functions  on  Nonhydrogen  Atoms  into  Two  Sets.  Enhancing  the 
6-31  G(d,p)  basis  set  by  replacing  the  set  of  d  polarization  functions  on  nonhydrogen 
atoms  by  two  sets  decreases  stabilization  energies  by  I  kcal/mol  or  less  at  all  levels 
of  theory.  In  contrast,  this  same  enhancement  of  the  6-31  \G(d,p)  basis  set  increases 
the  stabilization  energies  of  F2H  1  by  about  1  kcal/mol,  but  has  little  effect  on  the 
hydrogen  bond  energies  of  N2H, '  and  02H/'. 

Splitting  p  Functions  on  Hydrogens.  The  effect  of  replacing  the  standard  set  of  p 
functions  on  hydrogens  by  two  sets  depends  on  the  starting  basis  set  and  on  the  par¬ 
ticular  complex.  This  enhancement  of  the  6-31  G(d,p)  basis  has  little  effect  on  the 
stabilization  energies  of  these  negative  ion  hydrogen  bonded  complexes,  increasing 
them  by  I  kcal/mol  or  less  at  all  levels  of  theory.  With  the  6-3 \\G(d,p)  basis 
the  splitting  of  the  p  functions  decreases  the  hydrogen  bond  energies  of  F2H ~ 1  and 
02H3 1  but  increases  the  hydrogen  bond  energies  of  N2H5‘.  These  changes  do  not 
exceed  1  kcal/mol. 

6-31 +G(2zf,2p)  versus  6-311  +G(2d ,2p).  The  enhancement  of  both  the  valence 
double-  and  triple-split  basis  sets  by  the  addition  of  diffuse  functions  on  heavy  atoms 
and  the  splitting  of  the  polarization  functions  on  all  atoms  into  two  sets  results  in 
computed  6-31+G(2d,  2p)  and  6-31 1  +G(2d,  2p)  hydrogen  bond  energies  for  each 
complex  which  are  similar  at  corresponding  levels  of  theory.  The  6-31  \+G(2d,2p) 
energy  of  F2H_1  is  always  greater  than  the  corresponding  6-31 1  +G(2d,  2p)  energy, 
but  the  difference  does  not  exceed  1  kcal/mol.  Corresponding  differences  for  the 
other  complexes  are  less  than  0.5  kcal/mol.  Moreover,  the  differences  between  cor¬ 
responding  6-31  +G(d,p)  and  6-31  l+G(2d,  2p)  energies  are  about  l  kcal/mol  at 
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Hartree-Fock,  and  no  more  than  0.5  kcaJ/mol  at  correlated  levels  of  theory.  Given 
the  computational  expense  of  the  larger  basis  set  and  the  similarity  of  results,  the 
6-31  +G(d,p)  basis  would  appear  to  be  the  basis  set  of  choice  for  studies  of  negative 
ion  hydrogen  bond  energies  in  larger  systems. 

Additivity.  The  specific  changes  in  the  computed  6-31  G(d,p)  and  6-311  G(d,p) 
hydrogen  bond  energies  will  be  equal  to  those  computed  directly  with  the  6-3 1  + 
G(2d,  2p)  and  6-31 1  +G(2d,  2p)  basis  sets.  For  these  negative  ion  hydrogen-bonded 
may  be  added  to  the  hydrogen  bond  energies  computed  with  each  of  these  two  start¬ 
ing  basis  sets,  respectively.  If  basis  set  enhancement  effects  are  additive,  the  resulting 
hydrogen  bond  energies  will  be  equal  to  those  computed  directly  with  the  6-3 1  + 
G(2d,2p)  and  6-311  +G(2d,  2p)  basis  sets.  For  these  negative  ion  hydrogen-bonded 
complexes,  the  sum  of  the  6-31  G(d,p)  hydrogen  bond  energy  and  the  specific 
changes  in  the  stabilization  energy  brought  about  by  each  enhancement  approximates 
the  computed  6-31  +G(2d,  2p)  hydrogen  bond  energy  to  within  1.2  kcal/mol  at  all 
levels  of  theory.  With  the  6-31  lG(d,p)  basis  as  the  starting  basis  set,  the  approximate 
values  are  within  1.1  kcal/mol  of  the  computed  6-31 1  +G(2d,  2p)  hydrogen  bond 
energies.  It  should  be  recognized  that  the  changes  in  the  hydrogen  bond  energies 
brought  about  by  various  enhancements  to  the  basis  sets  are  dominated  by  the  addi¬ 
tion  of  diffuse  functions  to  nonhydrogen  atoms.  All  other  enhancements  have  rela¬ 
tively  minor  effects,  and  may  be  in  opposite  directions.  Nevertheless,  it  is  interesting 
to  note  that  the  same  enhancements  of  these  basis  sets  were  more  nearly  additive  in 
the  corresponding  positive  ion  hydrogen  bonded  complexes,  where  differences  of 
only  0.8  and  0.2  kcal/mol  were  found  between  estimated  and  calculated  stabilization 
energies  with  the  6-31  +G(2d,  2 p)  and  6-311  +G(2d,  2 p)  basis  sets,  respectively  [28], 

Correlation  Effects 

It  is  apparent  from  the  data  presented  thus  far  that  diffuse  functions  in  the  basis  set 
are  absolutely  necessary  for  adequately  describing  the  stabilization  energies  of  these 
negative  ion  hydrogen-bonded  complexes.  The  presence  of  such  functions  leads  to 
significant  decreases  in  binding  energies.  In  contrast,  the  overall  effect  of  electron 
correlation  is  to  increase  the  binding  energies  of  these  complexes,  with  the  increase 
being  significantly  larger  when  computed  with  basis  sets  without  diffuse  functions. 
Thus  for  these  basis  sets,  correlation  further  increases  the  stabilization  energies  which 
are  already  too  large  at  Hartree-Fock.  Therefore,  correlation  calculations  should  not 
be  done  in  these  cases.  In  the  following  analyses  of  the  effects  of  electron  correlation 
on  stabilization  energies,  comments  will  be  limited  to  trends  observed  for  basis  sets 
which  contain  diffuse  functions  on  nonhydrogen  atoms. 

Inclusion  of  Correlation.  The  correlation  energy  contribution  to  the  hydrogen 
bond  energy  leads  to  an  increase  in  the  binding  energies  of  all  complexes.  The  corre¬ 
lation  contribution  varies  with  the  particular  complex,  and  is  a  significant  part  of  the 
total  stabilization  energy.  At  MP4/6-31 1  +G(2d,  2p)  it  ranges  from  2.8  kcal/mol 
(7%  of  the  Hartree-Fock  energy)  for  F2H~\  to  5.8  kcal/mol  (27%)  for  the  C2  form  of 
02Hj ',  and  4.0  kcal/mol  (41%)  for  N2Hs 
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Hydrogen  Bond  Energies  at  Various  Levels  of  Correlation.The  second-order 
contribution  to  the  correlation  energy  increases  the  stability  of  all  complexes,  and  is 
significantly  larger  than  any  other  correlation  term.  The  MP2  correlation  contribution 
ranges  from  about  3  kcal/mol  for  F2H~'  to  6  kcai/mol  for  the  C2  form  of  Odd/'. 
The  MP3  and  MP4  correlation  energy  contributions  to  the  hydrogen  bond  energies 
are  smaller  than  the  MP2  contribution,  and  may  be  of  the  same  or  opposite  sign,  de¬ 
pending  on  the  particular  complex.  The  MP2  and  MP4  stabilization  energies  of  each 
complex  differ  by  no  more  than  1  kcal/mol  when  computed  with  basis  sets  derived 
from  either  the  6-31  G(d,p)  or  6-311  G(d,p)  basis  sets.  In  general,  the  stabilization 
energy  computed  at  MP2  is  larger  than  at  MP4,  except  for  N2H5~',  in  which  case  the 
MP4  energies  may  be  slightly  larger.  The  triples  contribution  to  the  fourth-order  term 
stabilizes  these  complexes  by  1  kcal/mol  or  less. 

Comparisons 

An  extensive  study  of  F2H  1  has  been  reported  [22]  recently  in  which  the  opti¬ 
mized  geometry  of  this  complex  was  computed  with  correlation  at  MP2  employing 
the  6-31 1  +  +  G(2d,  2p)  basis  set.  A  single-point  MP4  calculation  with  the 
6-311  +  +G(3<//,  3 pd)  basis  set  was  then  carried  out.  This  basis  set  contains  three  sets 
of  d  functions  and  a  set  of  /  functions  on  nonhydrogen  atoms,  and  three  sets  of  p 
functions  and  a  set  of  d  functions  on  hydrogens.  The  computed  stabilization  energy  is 
-45.6  kcal/mol.  This  compares  with  the  MP4/6-3 1 1  +  G(2rf,  2p)  energy  of 
—44.3  kcal/mol  obtained  in  the  present  study. 

The  experimental  value  of  the  gas-phase  reaction  enthalpy  for  the  solvation  of 
OH'1  by  H20  has  been  measured  recently,  and  found  to  be  -26.8  kcal/mol  [46J. 
Combining  the  MP4/6-31 1+G(2d,  2p)  electronic  association  energy  for  the  Hartree- 
Fock  equilibrium  C,  structure  of  02H3  1 ,  the  zero-point  and  thermal  vibrational 
energy  corrections  at  HF/6-31  G(d)  for  this  structure,  and  the  remaining  thermal 
terms  [47],  leads  to  the  same  computed  enthalpy  of  -26.8  kcal/mol.  The  enthalpy 
computed  from  the  MP2/6-31 +G(d,p)  energy  is  also  -26.8  kcal/mol. 

Conclusions 

In  this  study,  basis  set  and  correlation  effects  on  the  computed  hydrogen  bond 
energies  of  the  negative  ion  hydrogen-bonded  complexes  AH„  •  AH,.,'1  for 
AH„  =  NHj,  OH2,  and  FH,  have  been  evaluated.  The  addition  of  diffuse  functions 
on  nonhydrogen  atoms  to  valence  double-  and  triple-split  plus  polarization  basis  sets 
is  absolutely  necessary  for  adequately  describing  binding  energies,  which  are 
decreased  by  9-19  kcal/mol,  depending  on  the  particular  complex  and  the  level  of 
theory.  The  effect  is  greater  with  correlation  than  at  Hartree-Fock.  Addition  of  dif¬ 
fuse  functions  to  hydrogen  atoms  has  a  negligible  effect  on  stabilization  energies, 
while  replacing  the  single  set  of  polarization  functions  on  each  atom  by  two  sets  al¬ 
ters  energies  by  1  kcal/mol  or  less.  In  contrast  to  the  energy-lowering  effects  of  aug¬ 
menting  the  6-31  G{d,p)  and  6-311  G(d,p)  basis  sets,  the  overall  effect  of  electron 
correlation  is  to  increase  hydrogen  bond  energies.  Thus,  for  basis  sets  without  diffuse 
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functions,  correlation  further  increases  binding  energies  which  are  already  too  large 
at  Hartree-Fock.  Therefore,  correlation  calculations  should  not  be  done  in  these 
cases.  For  basis  sets  including  diffuse  functions,  the  correlation  energy  contribution 
to  the  hydrogen  bond  energies  of  these  complexes  is  significant,  with  the  Mdller- 
Plesset  second-order  term  being  the  largest  term  and  having  a  stabilizing  effect  of 
from  3-6  kcal/mol.  The  third-  and  fourth-order  terms  are  smaller  and  may  be  of  the 
opposite  sign.  As  a  result,  the  MP2  and  MP4  energies  differ  by  no  more  than 
1  kcal/mol.  with  the  MP2  energy  being  greater  except  for  N:H5~'.  The  computed 
standard  solvation  enthalpy  of  OH'1  by  H,0  based  on  either  MP2/6-31  +G(.d.p)  or 
MP4/6-31 1  +G(2d,  2p)  electronic  energies  is  -26.8  kcal/mol,  in  excellent  agree¬ 
ment  with  a  recent  gas-phase  measurement. 
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I.  Introduction 

Cyclic  ethers  are  significant  in  both  nonbiomedical  and  biomedical  areas.  Cyclic 
ethers  undergo  cationic  polymerization.  Oxetanes  (the  four-member  rings)  substituted 
with  exotic  energetic  substituents  form  energetic  polymers.  The  initiation  step  is  gov¬ 
erned  by  the  basicity  of  the  cyclic  ether. 

We  have  previously  carried  out  ab  initio  modpot/vrddo  scf  calculations  and  gener¬ 
ated  electrostatic  molecular  potential  contour  (empc)  maps  around  these  energetic- 
substituted  ethers  which  predicted  their  propensities  to  polymerize  even  prior  to  the 
synthesis  of  the  monomers  themselves  [1],  The  subsequent  propagation  step  in 
cationic  polymerization  involves  the  attack  of  protonated  oxetane  on  oxetane  with 
concomitant  opening  of  the  oxetane  ring.  For  opening  the  oxetane  and  protonated  ox¬ 
etane  rings  and  for  the  reactions  between  them  we  carried  out  ab  initio  mrd-ci  (multi¬ 
reference  double  excitation-configuration  interaction)  calculations  (by  the  method  of 
Buenker  and  Peyerimhoff,  Ref.  2)  based  on  localized  molecular  orbitals,  including 
explicitly  in  the  ct  the  localized  occupied  and  virtual  molecular  orbitals  in  the  reac¬ 
tion  region  and  folding  the  remainder  of  the  occupied  localized  molecular  orbitals 
into  an  effective  ct  Hamiltonian  [3-5). 

In  the  biomedical  area,  oxetanes  have  been  shown  to  induce  the  enzyme,  aryl  hy¬ 
drocarbon  hydroxylase  [6]  which  is  also  involved  in  metabolic  activation  of  poly¬ 
cyclic  aromatic  hydrocarbon  (PAH)  carcinogens. 

The  metabolic  activation  of  polycyclic  aromatic  hydrocarbons  (PAHs)  from  precar¬ 
cinogens  to  proximate  carcinogens  to  ultimate  carcinogens  involves  epoxidation  of 
the  PAHs  (7)  and  opening  of  the  epoxide  rings  either  protonated  or  nonprotonated. 

We  have  previously  published  detailed  ab  initio  modpot/vrddo/merge  scf  calcula¬ 
tions  on  the  potent  carcinogens  benzo(a)pyrene  [BP)  and  3-methyIcholanthrene  and 
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their  metabolites  (proximate  carcinogens:  epoxides  and  dihydrodiols;  ultimate  car¬ 
cinogens:  dihydrodiol  epoxides,  protonated  dihydrodiol  epoxides,  and  their  ring- 
opened  species)  [8, 9)  and  on  the  empc  maps  generated  around  these  molecules 
[10-12].  In  our  original  article  on  scf  calculations  on  the  BP  metabolites,  we  men¬ 
tioned  that  as  the  epoxide  ring  is  opened  there  appears  to  be  a  mixing  of  configurations. 

In  connection  with  protonation  of  PAH  epoxides  and  dihydrodiol  epoxides,  we  also 
previously  pointed  [13]  out  that  even  though  epoxide  +  H+  is  the  reaction  between 
two  closed-shell  ground-state  species,  a  single-determinant  scf  calculation  (even  an 
ab  initio  single-determinant  scf  calculation)  is  insufficient  to  describe  properly  the 
protonation  process  of  any  molecule.  The  reason  is  that  closed-shell  molecule  +  H 
is  an  ion-molecule  reaction.  There  are  two  types  of  ion-molecule  reactions: 

A+  +  B  IP(A)  <  1P(B)  (1) 

where  the  ionization  potential  (IP)  of  A  is  less  than  that  of  B.  In  this  case  there  is  the 
possibility  (but  not  the  guarantee)  that  a  single-determinant  ab  initio  scf  calculation 
could  yield  the  correct  potential  energy  surface  behavior. 

A*  +  B  IP(A)  >  IP(B)  (2) 

where  the  ionization  potential  of  A  is  greater  than  that  of  B.  (This  is  the  case  for  all 
protonation  reactions.)  In  this  case  there  has  to  be  at  least  a  pair  of  potential  energy 
surfaces  (singlet  and  triplet)  which  arise  from  the  lower  energy  asymptotic  pair 
A  *  +  B .  In  our  original  article  we  sketched  schematically  these  lower  surfaces  as 
being  repulsive.  The  physical  argument  about  protonation  being  an  ion-molecule 
reaction  of  type  2  involving  multipotential  surfaces  is  general  and  applicable  to  proto¬ 
nation  reactions  of  all  closed  shell  molecules  whose  ionization  potentials  are  lower 
than  that  of  the  hydrogen  atom. 

For  the  present  presentation,  in  the  panel  on  Carcinogenesis,  we  carried  out  ab  ini¬ 
tio  modpot  mrd-ci  calculations  to  investigate  two  points:  (a)  the  multipotential 
surface  character  of  protonation  reactions;  (b)  the  character  of  the  dominant  configu¬ 
rations  as  an  epoxide  ring  is  opened. 

II.  Methodology 

The  ab  initio modpot/ vrddo/merge  procedures  are  detailed  in  Ref.  14: 

modpot:  ab  initio  effective  core  model  potentials  which  enable  one  to  treat  the  va¬ 
lence  electrons  not  only  explicitly,  yet  accurately; 

vrddo:  a  charge-conserving  integral  prescreening  evaluation  procedure  which  de¬ 
cides  whether  the  magnitude  of  the  integrals  in  a  particular  block  is  suffi¬ 
ciently  large  to  warrant  calculating  explicitly  (which  we  named 
vrddo  —  variable  retention  of  diatomic  differential  overlap)  (especially  ef¬ 
ficient  for  spatially  extended  systems); 

merge:  to  save  and  reuse  common  skeletal  integrals  efficiently. 

The  ab  initio  mrd-ci  calculations  were  carried  out  by  the  procedure  of  Buenker  and 
Peyerimhoff  [2 ] .  First  a  small  ci  is  run  (several  hundred  to  a  thousand  configura 
tions),  from  this  the  most  important  configurations  are  chosen  as  reference  configura¬ 
tions  (up  to  80  configurations  can  be  included  as  reference  configurations),  all  single 
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and  double  excitations  are  generated  relative  to  all  the  reference  configurations,  the 
energy  contribution  of  each  configuration  is  estimated  by  a  perturbation  procedure, 
all  configurations  contributing  more  energy  than  a  certain  threshold  are  included  ex¬ 
plicitly  in  the  ci  calculation  (£CT),  the  energy  contribution  of  all  of  the  configurations 
not  included  explicitly  in  the  ci  calculation  is  added  back  in  by  a  perturbative  proce¬ 
dure  and  extrapolated  (£EXT)  and  finally  a  Davidson-type  correction  can  be  added  to 
account  for  size  consistency  (£full)- 

For  a  typical  mrd-ci  run  for  the  protonation  pathway  of  oxetane  231 ,721  csfs  (con¬ 
figuration  state  functions)  were  generated  from  28  reference  configurations  and 
~4500  csfs  were  selected  for  explicit  inclusion  in  the  ci  calculation  (using  a 
threshold  for  10  microhartrees).  The  mrd-ci  calculations  were  carried  out  from  an 
R(0-H+)  distance  of  1.5  bohrs  to  a  distance  of  10.0  bohrs.  We  carried  out  the  mrd-ci 
calculations  for  both  the  linear  in-plane  (C2l.)  attack  and  the  out-of-plane  C,  attack 
without  imposing  any  symmetry  for  the  ci  wave  functions  for  consistency  since  even 
a  slight  further  distortion  in  the  direction  of  the  proton  attack  will  destroy  any  sym¬ 
metry.  Thus  we  took  5  roots  of  the  Cl  matrix.  However,  while  we  did  not  impose 
symmetry  in  the  ci  calculation  the  results  still  contain  the  symmetry  information.  To 
facilitate  a  clear  transformation  from  C2l.  to  Cs  symmetry  rotations  we  describe  the 
symmetry  operations  for  both.  Commensurate  with  Herzberg's  conventions  115]:  (a) 
For  C2v  attack  the  oxetane  molecule  is  in  the  xz  plane  with  the  oxygen  along  the  z 
axis,  and  the  two  reflection  planes  are  cru  and  <rv..  The  in-plane  orbitals  are  o,  and  b, 
and  the  out-of-plane  orbitals  are  a2  and  b2;  (b)  For  C,  attack  the  oxetane  molecule  is 
in  the  xz  plane  this  time  with  the  oxygen  along  the  x  axis  and  the  reflection  plane  is 
pn.  The  a,  and  b2  orbitals  in  C2,  symmetry  correspond  to  a'  orbitals  in  C,  symmetry 
and  the  b,  orbital  in  C2v  symmetry  corresponds  to  an  a"  orbital  in  Cs  symmetry. 

For  opening  the  nonprotonated  oxirane  or  protonated  oxirane  ring  we  carried  out 
the  mrd-ci  calculations  using  canonical  molecular  orbitals  including  the  entire  va¬ 
lence  space.  From  the  23389  csfs  generated,  1675  csfs  were  included  explicitly  in 
the  ci  calculation  (a  threshold  of  10  microhartrees).  the  energies  of  remaining  configu¬ 
rations  were  added  back  in  by  a  perturbation  procedure,  and  a  Davidson-type  correc¬ 
tion  was  added  for  size  consistency. 

III.  Results  and  Discussion 

A.  Protonation  Paths  of  Oxetanes  Involve  Multipotential  Surfaces 

As  mentioned  above,  all  protonation  reactions  are  ion-molecule  reactions  A*  +  B 
of  type  2  where  the  ionization  potential  of  A  (the  H  atom)  is  higher  than  that  of  B  (the 
molecule).  Thus  there  should  be  potential  energy  surfaces  arising  at  the  asymptotes 
from  the  lower  energy  asymptotic  pair  of  species  A  +  B 

We  carried  out  very  detailed  ab  initio  mrd-ci  calculations  for  proton  attack  on  oxe¬ 
tane.  The  primary  focus  of  these  mrd-ci  studies  was  to  demonstrate  that  the  lowest 
asymptote  for  dissociation  of  a  protonated  species  (BH‘ )  was  to  B  +  H  and  not  to 
B  +  H+  (which  is  the  asymptote  pair  to  which  a  single-determinant  scf  wave  func¬ 
tion  would  dissociate).  There  are  calculations  reported  even  now  in  which  the  path 
for  protonation  and  deprotonation  is  depicted  incorrectly  as  the  single-determinant 
scf  potential  energy  surface.  While  applications  of  correlation  to  protonation  and 
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proton  transfer  reactions  have  been  reported  in  the  past  literature  [16-22],  none  of 
these  were  mrd-ci  calculations  aimed  at  demonstrating  the  multideterminant  and  the 
multipotential  potential  energy  surface  character  of  the  protonation  or  dissociation  of 
the  protonated  space.  Also,  it  is  not  possible  to  correct  the  deficiency  in  a  single- 
determinant  scf  potential  energy  curve  for  the  multideterminant  and  multipotential 
surface  character  by  subsequently  using  mbpt  (even  to  any  order  on  top  of  a  single- 
determinant  SCF  wave  function),  mbpt  only  corrects  the  calculated  potential  curve  by 
perturbation  theory  for  a  correlation  energy  correction  to  that  curve.  To  use  mbpt 
properly  for  the  dissociation  of  BH+,  it  would  first  be  necessary  to  carry  out  mrd-ci 
calculations  and  then  to  apply  mbpt  separately  to  each  of  the  multireference  ci 
surfaces  (with  proper  correction  for  intruder  states). 

The  mrd-ci  calculations  were  carried  out  using  localized  orbitals  (obtained  by  a 
Boys  localization  subsequent  to  the  scf).  This  localization  leads  to  much  more  com¬ 
pact  desirable  molecular  orbitals  for  the  ci  calculations.  For  this  series  of  protonation 
calculations  we  included  the  entire  valence  orbital  space  (occupied  and  virtual). 
Thus,  this  is  equivalent  by  a  unitary  transformation  to  using  the  canonical  orbitals  in 
the  mrd-ci.  Our  previous  ab  initio  mrd-ci  [4, 5]  (and  additional  ab  initio  mc-scf/ci 
derivative  geometry  optimization  calculations  [23]  using  the  GAMESS  program  [24]) 
calculations  on  protonated  oxetanes  had  indicated  that  the  equilibrium  geometry  for  a 
protonated  oxetane  was  C2v  with  the  oxetane  ring  planar  and  the  0-H+  bond  in  the 
plane  of  the  molecule  along  the  C2  axis.  We  carried  out  the  mrd-ci  calculations  for  a 
linear  attack  on  oxygen  by  the  proton  in  the  plane  and  also  for  an  out-of-plane  attack 
by  the  proton  along  an  oxygen  lone  pair  bond  direction.  The  potential  energy  curves 
are  shown  in  Figures  1  and  2.  The  curves  are  labelled  on  the  asymptotes  to  the  spe¬ 
cies  to  which  they  dissociate  smoothly.  The  results  shown  for  the  singlet  states 
(Fig.  1)  validate  our  original  hypothesis  of  a  lower  singlet  potential  energy  surface 
arising  from  the  lower  energy  pair  of  species  oxetane +  +  H. 

The  results  in  Figure  1  also  indicate  an  even  greater  complexity  for  the  potential 
energy  surfaces  than  we  had  originally  hypothesized.  The  lowest  energy  root  from 
1.5  bohrs  to  4.5  bohrs  (including  the  equilibrium  intemuclear  O-H4  distances  of  pro¬ 
tonated  oxetane  at  2.0  bohrs)  is  'A,.  At  4.5  bohrs  the  lowest  root  becomes  'B,  to  an 
asymptote  of  oxetane*  (ground  state)  +  H.  At  4.5  bohrs  the  'A,  curve  continues  as 
the  2nd  root  of  the  ci  matrix  smoothly  to  oxetane*  (first  excited  singlet  state)  +  H. 
The  third  root  of  the  ci  matrix,  also  a  'A,  state,  has  a  minimum  at  2.0  bohrs,  a  hump 
at  2.75  bohrs  and  then  continues  down  to  the  separated  products  oxetane  (ground 
state)  +  H+.  The  behavior  of  the  next  two  higher  roots  of  the  ci  matrix  is  also 
complicated. 

Similar  results  are  obtained  for  the  mrd-ci  calculations  where  the  proton  is  attack¬ 
ing  oxetane  from  an  out-of-plane  lone  pair  direction. 

The  mrd-ci  results  for  the  protonation  pathway  also  confirm  our  earlier  mrd-ci  re¬ 
sults  that  the  equilibrium  geometry  for  protonated  oxetane  has  the  proton  on  the  oxy¬ 
gen  linear  and  in  the  plane  of  the  oxetane  ring. 

In  Tables  I  and  II  are  shown  the  scf  and  mrd-ci  energies  as  a  function  of  H-O  dis¬ 
tance  in  the  protonation  and  deprotonation  of  oxetane. 
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PROTONATION  OF  OXETANE 


Figure  1.  Protonation  of  oxetane.  Along  the  Ol-H*  bond  (head  on,  in  plane).  Ab  initio 
modpot  mrd-ci  extrapolated  Cl  energies  (a.u.)  singlet  state. 


PROTONATION  OF  OXETANE 


Figure  2.  Protonation  of  oxetane.  H*  approaching  oxetane  along  the  lone  pair  direction. 
Ab  initio  modpot  mrd-ct  extrapolated  ci  energies  (a.u.)  singlet  state. 
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The  mrd-ci  results  show  that  the  ground  state  equilibrium  geometry  of  'A,  proto- 
nated  oxetane  goes  to  the  separated  ('A,)  products  oxetane"  (2A, >  -r  H.  To  what 
then  do  the  single-determinant  scf  calculations  for  protonation  which  one  sees  carried 
out  and  published  correspond?  We  laid  the  'A,  potential  surfaces  for  the  single- 
determinant  scf  calculation  over  the  mrd-ci  curves  (using  the  same  energy  value  for 
the  equilibrium  geometry  minimum  of  protonated  oxetane).  The  scf  curve  connects 
smoothly  to  the  asymptote  oxetane  +  H  +  ,  however,  the  scf  curve  starts  to  veer 
higher  than  the  correct  mrd-ci  lowest  'A!  curve  at  only  2.75  bohrs  and  continues  to 
veer  higher  until  it  reaches  the  asymptote  of  the  second  highest  mrd-ci  '  A ,  curve 
oxetane  +  H\  The  single-determinant  scf  wave  function  begins  to  deviate  in  fun¬ 
damental  character  from  the  correct  mrd-ci  wave  function.  The  form  of  the  single- 
determinant  scf  Hamiltonian  forces  this  incorrect  behavior.  As  an  additional  check 
we  calculated  the  gross  atomic  population  on  the  hydrogen  from  the  scf  wave  func¬ 
tions  where  it  goes  down  smoothly  from  0.51  at  the  equilibrium  geometry  of 
2.0  bohrs  to  0.00  at  10.0  bohrs  and  from  the  lowest  mrd-ci  wave  functions  where  it 
goes  up  from  0.51  at  the  equilibrium  geometry  of  2.00  bohrs  to  1.00  at  10.0  bohrs. 

The  implications  of  these  research  results  are  genera).  All  protonation  reactions 
H+  +  B  even  of  closed-shell  ground-state  molecules  (where  the  ionization  potential 
of  the  B  molecule  is  less  than  that  of  the  hydrogen  atom)  can  only  be  described  cor¬ 
rectly  by  mrd-ci  (or  possibly  ci)  calculations  including  all  the  potential  energy  sur¬ 
faces  for  all  the  states  which  arise  from  separated  species  H  +  B  lower  in  energy  at 
the  asymptotes  than  H+  +  B  and  all  other  states  which  cross  those  potential  energy 
surfaces.  All  other  ion-molecule  reactions  A*  r  B  (where  the  ionization  potential  of 
B  is  less  than  that  of  A  (including  those  where  A+  is  an  ultimate  carcinogen  which 
attacks  DNA,  RNA,  etc.)]  can  also  only  be  properly  described  by  mrd-ci  (or  possibly 
ci)  calculations  of  the  multipotential  surfaces. 

faces  arising  from  the  asymptotic  species  AH*  +  B  to  transfer  a  proton  to  form 
A  +  BH+  will  be  different  from  those  arising  from  BH*  +  A  to  transfer  a  proton  to 
form  AH+  +  B.  To  date,  even  the  ab  initio  calculations  reported  for  proton  transfer 
in  biological/biomedical  systems  appear  to  have  been  only  ab  initio  scf  calculations 
(perhaps  with  the  addition  of  mbpt  corrections).  As  we  pointed  out  above  mbpt  can¬ 
not  correct  for  the  lack  of  multideterminant  character  in  a  single-determinant  scf  cal¬ 
culation.  Our  findings  that  the  fundamental  character  of  single-determinant  scf 
potential  surfaces  begin  to  deviate  in  from  the  mrd-ci  surfaces  as  early  as  0.75  bohr 
from  the  equilibrium  position  of  the  proton  on  a  species  implies  that  mrd-ci  (or  possi¬ 
bly  ci  calculations)  [for  all  the  states  arising  from  the  asymptotes  AH+  +  B  or 
A  +  BH+  and  curves  which  cross  these]  will  be  necessary  to  describe  these  processes 
correctly.  We  are  extending  these  mrd-ci  studies  to  the  proton  transfer  reactions. 

B.  Differences  in  Dominant  Configurations  in  Opening  Nonprotonated  or 
Protonated  Oxirane  Rings  (Epoxides) 

Oxiranes  (the  3-member  ring  cyclic  ethers)  are  the  simplest  of  the  epoxides.  We  al¬ 
ready  knew  from  our  previous  mrd-ci  calculations  on  opening  oxetane  or  protonated 
oxetane  rings  (the  4-member  ring  cyclic  ethers)  that  the  single-determinant  scf  wave 
function  for  the  open  oxetane  ring  had  a  c2  contribution  of  only  0.69  to  the  mrd-ci 
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wave  function  while  the  single-determinant  scf  wave  function  for  the  open  proto- 
nated  oxetane  ring  had  a  c2  contribution  of  0.93  to  the  mrd-ci  wave  function.  On 
opening  the  protonated  oxetane  ring  both  electrons  remain  on  the  oxygen.  This  situa¬ 
tion  is  even  more  exaggerated  in  opening  nonprotonated  oxirane  rings.  For  an  oxirane 
ring  the  single-determinant  scf  wave  function  contributes  a  c2  of  only  0.40  to  the 
mrd-ci  wave  function.  The  single-determinant  scf  wave  function  for  an  open  proto¬ 
nated  oxirane  ring  contributes  a  c2  of  0.92  to  the  mrd-ci  wave  function  (jwst  as  for 
the  open  protonated  oxetane  ring).  Thus  a  single-determinant  scf  wave  function  for 
opening  a  protonated  cyclic  ether  ring  is  a  reasonable  description  of  the  system  while 
a  single-determinant  scf  wave  function  is  a  completely  incorrect  description  for  open¬ 
ing  a  neutral  cyclic  ether  ring. 

Thus  to  calculate  properly  the  pathway  for  opening  an  epoxide  ring  in  a  metaboli- 
cally  activated  PAH  carcinogen  (or  an  epoxide  ring  in  any  other  carcinogen  or 
metabolically  activated  carcinogen)  will  necessitate  mrd-ci  or  ci  calculations. 
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Abstract 

The  geometries  of  all  12  complexes  in  which  HF,  HC1,  or  HBr  is  paired  with  NHj,  NMeH2,  NMe2H,  or 
NMe3  are  optimized  with  the  mini-1  basis  set.  As  the  basicity  of  the  amine  is  increased  via  progressive 
methylation,  or  as  the  proton  affinity  of  the  halide  is  diminished,  the  proton  equilibrium  position  shifts 
toward  the  nitrogen,  but  in  no  case  is  this  shift  far  enough  to  classify  the  complex  as  an  ion  pair.  When  the 
effects  of  a  polarizable  medium  are  included  via  the  scrf  formalism,  the  shift  of  the  proton  toward  the 
nitrogen  is  enhanced  by  increases  in  the  solute-solvent  interaction  such  that  relatively  modest  coupling 
leads  to  complexes  of  ion-pair  type.  In  all  cases,  complexes  containing  HBr  are  the  most  sensitive  to  either 
the  basicity  of  the  amine  or  the  influence  of  the  medium  whereas  the  HF  analogs  are  affected  very  little. 

Introduction 

Over  the  last  several  decades,  ab  initio  quantum  mechanical  calculations  have 
served  as  a  rich  source  of  information  covering  a  wide  range  of  chemical  reactions. 
The  conceptual  simplicity  of  proton  transfers,  coupled  with  their  widespread  occur¬ 
rence  in  chemical  and  biological  processes,  has  endowed  this  type  of  reaction  with  a 
particular  significance  and  fostered  a  large  number  of  ab  initio  studies  [1-5].  Recent 
calculations  in  this  laboratory  have  led  to  the  enunciation  of  a  set  of  rules  which  gov¬ 
ern  proton  transfer  reactions  and  which  may  be  used  to  predict  the  important  en¬ 
ergetic  and  kinetic  parameters  in  any  given  case  from  first  principles  [6-9]. 

Whereas  our  previous  calculations  have  applied  rigorously  only  to  isolated  systems 
in  vacuo,  it  is  our  belief  that  the  principles  apply  in  condensed  phases  as  well,  albeit 
with  some  modification  to  account  for  the  influence  of  the  surrounding  medium.  Our 
strategy  for  quantitating  the  environmental  effects  involves  a  partitioning  into  a  num¬ 
ber  of  conceptual  stages,  of  diminishing  importance.  The  largest  effect  would  un¬ 
doubtedly  arise  from  the  electrostatic  fields  emanating  from  any  neighboring  ions  or 
dipoles.  Hence,  we  first  considered  the  effects  upon  the  proton  transfer  potential  of 
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an  array  of  such  entities  in  various  locations  about  a  H-bonded  system  [10].  Consis¬ 
tent  with  prior  work,  our  results  supported  the  contention  that  the  effects  of  ions  and/ 
or  dipoles  can  be  extremely  important  but  that  they  can  be  simply  understood  on  the 
basis  of  interaction  of  their  electric  fields  with  the  partial  positive  charge  of  the  pro¬ 
ton  being  transferred. 

Even  in  the  complete  absence  of  ions  or  permanent  dipoles,  one  would  expect  the 
proton  transfer  potential  to  be  affected  by  the  immersion  of  the  H-bonded  system  in  a 
polarizable  medium.  Indeed,  there  is  ample  evidence  from  prior  calculations  to  sup¬ 
port  this  notion  [11-18].  The  present  communication  reports  the  results  of  a  system¬ 
atic  study  of  a  number  of  closely  related  systems,  immersed  in  a  medium  whose 
dielectric  constant  is  smoothly  varied  over  a  wide  range.  For  this  purpose,  we  choose 
the  H-bonded  complexes  composed  of  a  hydrogen  halide  and  an  amine.  HF,  HC1, 
and  HBr  are  chosen  for  the  former,  so  as  to  cover  a  range  of  proton-donating  ability; 
NH3,  NMeH2,  NMe2H,  and  NMe3  are  proton  acceptors  of  increasingly  greater  power. 

Previous  work  of  both  a  theoretical  and  experimental  nature  has  demonstrated  that 
hydrogen  halide-amine  complexes  are  characterized  by  a  single-well  proton  transfer 
potential  [19-25].  The  position  of  this  minimum,  specifically,  the  proximity  of  the 
proton  to  the  amine  or  halide,  is  quite  sensitive  to  both  the  nature  of  the  subunits  in¬ 
volved  and  the  characteristics  of  the  medium  in  which  it  is  immersed.  For  example, 
H3N — HF  is  almost  certainly  a  neutral  pair  in  the  gas  phase  while  the  complex  be¬ 
tween  trimethylamine  and  HBr  is  better  described  as  an  ion-pair,  Me3NH+ — Br~. 
We  have  placed  the  various  systems  inside  a  spherical  cavity,  surrounded  by  a  dielec¬ 
tric,  and  monitored  the  equilibrium  proton  position  as  the  dielectric  constant  is  slowly 
increased.  In  this  manner,  we  hope  to  gain  insights  into  the  effects  of  a  polarizable 
medium  upon  the  nature  of  the  proton  transfer  process. 

Details  of  Calculations 

Ab  initio  calculations  were  carried  out  using  the  MONSTERGAUSS  package  of 
computer  codes  [26],  The  geometries  of  all  monomers  were  first  completely  opti¬ 
mized  in  order  to  calculate  the  deprotonation  and  dissociation  energies  reported 
below.  A  linear  H  bond  was  assumed  in  all  complexes.  The  internal  geometries  of  the 
amines  were  held  fixed  in  their  isolated  monomer  structures  while  the  distances  of  the 
central  proton  to  the  halide  and  nitrogen  atoms  were  optimized. 

The  effects  of  the  homogeneous  reaction  field  were  included  via  the  version  of 
Tapia’s  scrf  program  which  is  incorporated  into  MONSTERGAUSS.  The  reaction 
field  susceptibility,  that  is,  the  solute-solvent  coupling  factor,  is  computed  as 

g  =  (2/a})(e  -  l)/(2e  +  1)  (1) 

where  c  is  the  static  dielectric  constant  of  the  medium  and  a  the  radius  of  the  spheri¬ 
cal  cavity  in  which  the  HX-amine  system  was  placed  [15-18,27], 

From  previous  calculations,  it  appears  that  the  nature  of  the  calculated  proton 
transfer  potential  in  hydrogen  halide-amine  systems  is  especially  sensitive  to  the 
level  of  theory.  For  example,  whereas  scf  calculations  suggest  double-minimum 
potentials  for  complexes  of  HC1  with  methylated  amines,  inclusion  of  correlation 
leads  to  a  potential  with  a  single  broad  minimum  in  all  cases  [19].  Similarly  with 
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BrH-NMeH,,  the  two  wells  found  in  the  scf  potential  [22]  coalesce  into  a  single 
broad  minimum  in  the  correlated  potential  [20] .  When  viewed  in  tandem  with  a 
wealth  of  inert  matrix  spectroscopic  data,  the  picture  that  emerges  is  one  in  which 
most,  if  not  all,  hydrogen  halide-amine  complexes  are  characterized  by  a  single- well 
potential.  The  equilibrium  position  of  the  central  proton  shifts  smoothly  between  the 
halide  and  N  atoms  as  the  relative  basicities  of  the  two  component  subunits  are  al¬ 
tered. 

Since  scrf  formalism  has  not  been  incorporated  into  correlated  treatments,  it  was 
not  possible  for  us  to  directly  include  the  effects  of  electron  correlation.  Further  con¬ 
siderations,  such  as  the  size  of  some  of  our  systems  and  the  large  number  of  ge¬ 
ometrical  configurations  required  for  our  systematic  study,  effectively  rule  out  the  use 
of  an  extended  polarized  basis  set.  Sets  of  the  double-^  type  would  not  be  appropriate 
here  since  they  lead  to  the  erroneous  conclusion  of  a  double-minimum  proton  transfer 
potential  for  these  systems  ( 19]. 

Fortunately  for  our  purposes,  the  economical  mini-1  basis  set,  developed  by  Tate- 
waki  and  Huzinaga  [28]  correctly  reproduces  the  single-well  nature  of  the  potentials 
of  all  of  the  hydrogen-halide  systems.  Moreover,  as  may  be  seen  in  Table  I,  this  basis 
set  does  a  surprisingly  good  job  with  the  deprotonation  energies  of  the  hydrogen 
halides  and  protonated  amines.  All  experimental  trends  are  reproduced  correctly;  viz . 
HF  >  HC1  >  HBr  and  the  increasing  basicity  of  the  amines  arising  from  progressive 
methylation.  Quantitatively,  comparison  of  the  last  two  columns  of  Table  1  reveals 
that  all  energies  are  within  1  or  2  kcal/mol  of  the  experimental  quantities  with  the 
exception  of  HC1  and  HBr  which  are  predicted  to  be  too  basic  by  about  30  kcal/mol. 
This  discrepancy  is  not  at  all  surprising  in  light  of  the  large  basis  set  superposition 


Table  I.  Deprotonation  Energies  and  Equilibrium  Bond  Lengths  in  Protonated 

Species. 


Deprotonation  energy,  kcal/mol 
AE(0  K)  \H( 298  K) 


r(XH).  A 

calc 

calc’ 

expt'- 

HF 

0.980 

377.3 

372.3 

371.3 

HCI 

1.367 

367.9 

364.5 

333.3 

HBr 

1.493 

360.2 

357.3 

323.6 

r(NH),  A 

calc 

calc' 

exptd 

nh; 

1.085 

211.5 

202.8 

205.0 

H,MeN* 

1.079 

220.7 

212.0 

214.1 

H2Me2N* 

1.075 

227.7 

219.0 

220.5 

HMe,N* 

1.071 

232.5 

223.8 

224.3 

*  Corrected  by  adding  3/2  RT  and  zero-point  vibrational  energy  corrections  from 
Ref.  29. 

6  Ref.  30. 

'  Corrected  by  adding  5/2  RT  and  zero-point  vibrational  energy  corrections  from 
Ref.  31  (10.2  assumed  for  AZPE  of  all  amines) 

4  Ref.  32. 
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error  expected  for  the  single-center  halide  anions  with  a  minimal  basis  set.  Rather, 
what  is  surprising  is  the  good  agreement  for  HF,  due  presumably  to  cancellation  of 
the  superposition  error  with  other  deficiencies  of  the  basis  set. 

In  sum,  due  to  the  good  reproduction  of  proton  transfer  potentials  and  deprotona¬ 
tion  energies,  we  expect  the  scf  calculations  with  the  mini-1  basis  set  to  provide  a 
reasonable  model  of  the  systems  of  interest.  The  largest  errors  are  expected  for  the 
complexes  containing  HC1  and  HBr  for  which  the  high  deprotonation  energies  will 
make  it  more  difficult  to  extract  a  proton,  biasing  the  potentials  away  from  the  ion- 
pair.  In  a  broader  context,  although  the  theoretical  model  may  have  certain  unavoid¬ 
able  failings  with  respect  to  the  specific  systems,  it  will  certainly  provide  insights 
into  the  manner  in  which  the  structure  of  a  general  H  bond  is  affected  by  the 
difference  in  proton  affinity  between  the  two  subunits  and  by  its  immersion  in  a 
dielectric  bath. 


Complexes  in  Vacuo 

The  optimized  geometries  of  all  12  complexes,  as  well  as  their  complexation  ener¬ 
gies  in  vacuo,  are  reported  in  Table  II.  It  is  clear  from  the  first  grouping  that  in  com¬ 
plexes  containing  HF,  neither  the  bond  lengths  to  the  central  hydrogen  nor  the 
dissociation  energies  E°  are  much  affected  by  the  nature  of  the  base.  In  complexes 
containing  either  HC1  or  HBr,  on  the  other  hand,  methylation  of  the  amine  leads  to 
(i)  substantial  shortening  of  r(NH),  (ii)  noticeable  lengthening  of  r(XH),  and  (iii)  a 
larger  ED.  The  combined  result  of  (i)  and  (ii)  can  be  described  as  a  shift  of  the  proton 
from  the  halide  to  the  nitrogen  atom,  consistent  with  the  greater  basicity  of  the 
amine.  This  enhanced  proton  acceptor  ability  makes  for  a  stronger  interaction  with 
HX,  hence  the  increase  in  E°. 

It  would  be  useful  to  have  a  quantitative  measure  of  the  degree  of  sharing  of  the 
central  proton  between  the  halide  and  the  amine,  one  that  could  be  used  in  all  of  our 
systems.  Since  the  ionic  character  of  the  complex  clearly  increases  as  the  proton 
moves  away  from  the  halogen  atom,  one  might  draw  a  connection  between  r(XH) 


Table  II.  Calculated  Geometries  and  Dissociation  Energies  of  Complexes  In  Vacuo. 


system 

rtNH),  A 

r(XH).  A 

E°.  kcal/mol 

FH-NH, 

1.744 

0.996 

14.2 

FH-NMeH2 

1.733 

0.997 

13.8 

FH-NMe2H 

1.727 

0.997 

14.1 

FH-NMe, 

1.721 

0.997 

13.8 

CIH-NH, 

1. 579 

1.443 

14.5 

CIH-NMeH2 

1.535 

1.456 

15.1 

CIH-NMe,H 

1.497 

1.468 

15.8 

ClH-NMe, 

1.468 

1.479 

16.2 

BrH-NH3 

1.538 

1.603 

12.3 

BrH-NMeH2 

1.456 

1  637 

13  3 

BrH-NMe2H 

1.386 

1.674 

14  6 

BrH-NMe, 

1.340 

1.703 

15.5 
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and  the  degree  of  proton  transfer.  However,  the  different  size  of  each  halogen  atom 
would  make  such  an  interpretation  rather  misleading.  For  example,  since  Br  is  much 
larger  than  F,  r(BrH)  would  naturally  be  much  longer  than  r(FH),  even  when  there  is 
no  proton  transfer  at  all.  Nor  would  r(NH)  be  suitable  since  there  is  some  variation  in 
the  equilibrium  bond  length  in  the  various  protonated  amines  (see  Table  I). 

Rather  than  use  the  bond  lengths  themselves,  we  have  devised  a  parameter  which 
incorporates  the  stretches  of  the  bonds.  Specifically,  Ar(XH)  is  defined  as  the  differ¬ 
ence  between  r(XH)  in  the  complex  and  r(XH)  in  the  isolated  hydrogen  halide 
molecule,  the  values  of  which  are  listed  in  Table  1.  Similarly,  Ar(NH)  represents  the 
stretch  of  the  proton  in  the  complex  away  from  the  nitrogen  atom,  relative  to  the  ap¬ 
propriate  protonated  amine,  the  bond  lengths  of  which  are  also  contained  in  Table  1. 
We  define  a  “proton  transfer  parameter”  as  the  difference  between  the  two  latter 
stretches. 


p  =  Ar(XH)  -  Ar(NH)  (2) 

When  the  proton  within  a  given  complex  has  been  pulled  an  equal  distance  away 
from  both  the  halogen  and  the  nitrogen  atoms,  the  two  stretches  are  equal  and  p  =  0, 
appropriate  for  an  equally  shared  proton.  Negative  values  of  p  correspond  to  a  lesser 
stretch  away  from  the  halogen  than  from  the  amine,  which  we  interpret  as  a  neutral 
pair.  Similarly,  as  p  becomes  progressively  more  positive,  the  proton  is  pulled  more 
toward  the  amine  and  the  complex  takes  on  more  ionic  character. 

The  proton  transfer  parameter  p  is  presented  for  all  12  of  our  complexes  in  Fig¬ 
ure  1 .  The  horizontal  axis  corresponds  to  the  relative  attracting  power  of  the  two  sub¬ 
units  of  each  complex  for  the  central  proton.  Specifically,  the  normalized  proton 
affinity  difference,  NPAD  is  defined  as 

NPAD  =  [PA(Am)  -  PA(X')]  /  [PA(Am)  +  PA(X~)]  (3) 

where  PA(Am)  and  PA(X’)  refer  to  the  proton  affinities  of  the  amine  and  halide 
anion,  respectively.  We  take  these  proton  affinities  to  be  the  calculated  deprotonation 
energies  AE(OK)  of  the  corresponding  protonated  species  in  Table  I.  This  normalized 
quantity  has  found  widespread  use  by  spectroscopists  over  the  years  [23]. 

The  data  in  Figure  1  are  grouped  together  by  hydrogen  halide.  The  numbers  label¬ 
ing  each  point  indicate  the  number  of  methyl  groups  on  the  amine.  Hence,  the  far  left 
point  of  the  HF  curve,  labeled  by  0,  refers  to  the  FH-NH,  complex  and  the  far  right 
point  to  FH-NMe3.  The  flatness  of  the  HF  curve  is  a  manifestation  of  the  insensitivity 
oi  the  equilibrium  proton  position  in  FH-Am  complexes  to  the  basicity  of  the  amine 
Am.  Indeed,  r(FH)  undergoes  essentially  no  change  as  the  proton  affinity  of  the 
amine  increases  over  a  range  of  21  kcal/mol.  The  large  negative  values  of  p  confirm 
the  strong  neutral-pair  character  of  the  complexes  containing  HF. 

When  HF  is  replaced  by  HC1 ,  p  remains  in  the  negative  domain  but  shows  a  great 
deal  more  sensitivity  to  NPAD.  The  21  kcal/mol  rise  in  proton  affinity  from  NH,  to 
NMe3  increases  p  from  -0.42  A  to  -0.29  A.  The  sensitivity  is  further  enhanced  in 
the  complexes  containing  HBr  where  p  varies  from  -0.34  A  to  -0.06.  Indeed,  the 
small  magnitude  of  the  latter  value  of  p  would  lead  us  to  describe  the  proton  in  the 
BrH-NMe3  complex  as  being  almost  equally  shared  between  the  Br  and  the  base.  A 
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Figure  ( .  Proton  transfer  parameter  p  for  complexes  in  the  gas  phase  plotted  as  a  function 
of  the  normalized  proton  affinity  difference,  NPAD,  between  the  amine  and  halide  The  nu¬ 
merical  label  on  each  point  refers  to  the  number  of  methyl  groups  within  the  amine.  The 
negative  values  of  p  indicate  a  neutral  pair. 

major  finding  is  thus  that  the  increased  basicity  of  the  amine  caused  by  progressive 
methylation  leads  to  an  increased  degree  of  proton  transfer. 

One  may  look  at  the  results  from  the  alternate  perspective  of  holding  the  amine 
constant  and  changing  the  acid.  Comparison  of  all  three  points  marked  by  any  num¬ 
ber  n  of  methyl  groups  on  the  amine  immediately  reveals  that  p  increases  in  the  se¬ 
quence  HF  <  HC1  <  HBr.  That  is,  a  more  acidic  hydrogen  halide  leads  to  a  greater 
degree  of  proton  transfer  to  the  amine  within  the  complex. 

In  summary,  all  of  the  complexes  examined  here  would  be  best  described  in  vacuo 
as  neutral  pairs  in  which  the  proton  is  more  closely  associated  with  the  halogen  than 
with  the  amine.  The  least  acidic  hydrogen  halide,  HF,  shows  little  tendency  to  release 
its  proton,  while  our  strongest  base,  NMe3,  is  nearly  successful  in  achieving  equal 
sharing  of  the  proton  with  our  strongest  acid,  HBr.  It  was  mentioned  above  that  the 
calculated  proton  affinities  of  Br”  and  Cl'  are  inflated  when  compared  to  experiment, 
exaggerating  the  ability  of  these  halogens  to  hold  on  to  a  proton.  It  would,  therefore, 
not  be  unreasonable  to  presume  that  the  values  of  p  would  be  appreciably  larger  in 
the  real  systems,  probably  surpassing  zero  for  a  number  of  complexes  which  would 
fall  into  the  category  of  ion-pairs  in  vacuo. 

One  last  facet  meriting  discussion  concerns  the  “compactness”  of  each  complex. 
Addition  of  r(NH)  and  r(XH)  in  Table  II  yields  the  distance  between  the  N  and  X 
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atoms  which  we  refer  to  as  the  H-bond  length,  R .  In  any  series  associated  with  a  par¬ 
ticular  halide  atom,  the  decrease  in  r(NH)  arising  from  progressive  methylation  of  the 
amine  outweighs  the  smaller  increase  in  r(XH)  and  R  is  therefore  diminished.  For  ex¬ 
ample,  R  is  equal  to  2.740  A  for  FH-NH3  and  decreases  by  0.02  A  to  2.718  A  in  FH- 
NMe3.  The  corresponding  reductions  in  R  caused  by  trimethylation  are  0.05  and  0. 10 
A  for  the  C1H  and  BrH  series,  respectively.  Making  use  of  our  prior  finding  of 
greater  ion-pair  character  induced  by  methylation,  we  may  further  connect  the  degree 
of  proton  transfer  with  a  shorter  H  bond. 

As  a  point  of  interest,  let  us  consider  the  HBr  curve  in  Figure  1  which  would  inter¬ 
cept  the  horizontal  axis  at  about  -0.21.  Using  Eq.  (3),  we  get  235  kcal/mol  as  the 
proton  affinity  of  an  amine  which  could  equally  share  a  proton  with  Br  ,  particularly 
notable  since  this  value  is  some  125  kcal/mol  smaller  than  the  proton  affinity  of  Br”. 
The  reason  why  an  amine  with  a  lower  proton  affinity  can  successfully  compete  with 
the  anion  rests  in  the  fact  that  transfer  of  the  proton  to  the  amine  produces  an  ion  pair 
with  an  extremely  strong  electrostatic  attraction.  In  contrast,  the  force  holding  the 
complex  together  when  the  proton  resides  on  the  halogen  is  much  smaller  since  both 
subsystems  are  formally  uncharged.  This  same  principle  is  responsible  for  the  afore¬ 
mentioned  contraction  of  the  H  bond  as  the  proton  is  transferred  toward  the  amine. 

Immersion  in  Dielectric  Continuum 

Let  us  now  place  each  complex  in  a  spherical  cavity  within  a  dielectric  medium. 
As  described  earlier,  the  g  parameter  contains  the  strength  of  the  electrostatic  interac¬ 
tion  of  the  complex  with  the  solvent,  by  way  of  the  cavity  radius  and  the  dielectric 
constant.  The  degree  of  proton  transfer  p  for  each  complex  is  presented  in  graphical 
fashion  as  a  function  of  g  in  Figure  2.  The  solid  curves  correspond  to  the  HF  com¬ 
plexes,  dashed  curves  to  HC1,  and  dotted  to  HBr.  Each  curve  is  labeled  with  the 
number  of  methyl  groups  on  the  amine. 

It  should  be  first  noted  that  all  curves  rise  as  g  is  increased.  Thus,  the  greater  inter¬ 
action  with  the  solvent  preferentially  stabilizes  the  ion-pair,  shifting  the  proton  to¬ 
ward  the  amine  and  thus  making  p  more  positive.  The  solid  curves  which  represent 
the  HF-containing  complexes  rise  most  gradually,  indicating  that  even  a  relatively 
strong  interaction  with  solvent  can  only  produce  a  slight  shift  of  the  proton  toward 
the  amine  in  these  intrinsically  very  nonionic  complexes. 

The  increases  in  p  are  much  sharper  for  the  other  curves  in  Figure  2.  indicating 
that  the  character  of  the  complexes  containing  HC1  and  HBr  is  much  more  sensitive 
to  interaction  with  the  solvent.  As  shown  by  the  intercepts  of  the  dashed  curves  with 
the  horizontal  axis,  equal  sharing  of  the  proton  between  HC1  and  the  amine  occurs  for 
g  in  the  range  0.001-0.002  au.  In  order  to  place  this  range  in  perspective,  if  we  take 
the  cavity  radius  to  be  4  A,  the  approximate  maximum  length  of  any  of  our  HCI- 
containing  complexes,  a  value  of  0.002  au  for  g  corresponds  to  a  dielectric  constant 
e  of  10. 

The  sensitivity  of  p  to  the  influence  of  solvent  is  even  greater  for  the  complexes 
containing  HBr,  as  illustrated  by  the  steepness  of  the  dotted  curves.  This  sensitivity, 
in  conjunction  with  the  more  nearly  equal  sharing  of  the  proton  even  in  the  gas 
phase,  leads  to  Br” . .  vHAm  ion-pairs  (i.e.,  p  >  0)  for  relatively  small  values  of  g. 
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Figure  2.  Degree  of  proton  transfer  shown  in  terms  of  the  strength  of  the  interaction  of  the 
complex  with  its  surrounding  dielectric  continuum.  Solid  curves  refer  to  HF  complexes, 
broken  to  HCI,  and  dotted  to  HBr.  Each  curve  is  labeled  with  the  number  of  methyl  groups 
contained  within  the  amine,  g  is  in  units  of  au. 


For  example,  taking  4.7  A  as  the  radius  of  complexes  containing  HBr,  the  g  value  of 
0.0005  au,  for  which  a  number  of  these  complexes  are  in  the  ionic  range  of  p,  corre¬ 
sponds  to  a  value  of  only  2  for  e.  Keeping  in  mind  the  inflated  proton  affinities  of  the 
Cl"  and  Br~  anions  with  the  mini-1  basis  set,  we  may  anticipate  that  a  more  realistic 
treatment  would  displace  all  the  dashed  and  dotted  curves  upward  to  some  extent  and 
would  result  in  a  crossing  of  the  horizontal  axis  at  reduced  values  of  g . 

It  may  be  noted  that  a  number  of  the  curves  in  Figure  2  do  not  extend  all  the 
way  to  the  right  end  of  the  scale.  These  limits  relate  to  the  finite  size  of  each  com¬ 
plex  which  provides  a  minimum  for  the  cavity  radius  a .  Since  the  expression 
(e  -  l)/(2e  +  1)  has  a  maximum  value  of  0.5  for  infinite  e,  the  entire  expression 
for  g  in  Eq.  (1)  is  bounded  above  by  the  inverse  cube  of  the  molecular  dimension. 
Attempts  to  reduce  a  below  the  van  der  Waals  radius  of  any  complex  led  to  unrealis¬ 
tic  expansion  of  the  system. 

Concomitant  with  the  increasing  ion-pair  character  of  the  various  complexes  as 
manifested  by  more  positive  values  of  p,  the  enhanced  interaction  with  the  medium 
also  induces  a  contraction  of  each  complex.  In  keeping  with  the  earlier  patterns,  the 
FH-amine  complexes  are  least  sensitive,  with  R  diminishing  by  up  to  0.035  A  when  g 
is  increased  from  zero  to  0.002  au.  The  H-bond  contraction  in  complexes  containing 
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HC1  or  HBr  is  substantially  greater,  up  to  0. 1  A.  This  trend  would  undoubtedly  con¬ 
tinue  for  larger  values  of  g  except  for  the  aforementioned  problem  that  when  the 
cavity  radius  a  is  reduced  below  the  van  der  Waals  length  of  the  complex,  an  unreal¬ 
istic  expansion  is  observed. 


Conclusions 

Raising  the  basicity  of  the  amine  Am  by  progressive  methylation  shifts  the  equi¬ 
librium  position  of  the  proton  in  XH  —  Am  toward  the  nitrogen  atom.  These  shifts 
are  most  pronounced  when  X  =  Br,  Cl  while  the  proton  position  is  almost  com¬ 
pletely  stationary  with  respect  to  changes  in  the  amine  when  X  =  F.  Nevertheless,  all 
12  of  the  complexes  studied  here  are  predicted  to  be  nonionic  in  the  gas  phase;  in 
other  words,  die  proton  is  stretched  a  smaller  distance  from  the  halide  than  from  the 
nitrogen  of  the  amine. 

Immersion  of  the  system  within  a  dielectric  continuum  leads  in  all  cases  to  a  more 
ionic  complex,  as  one  might  expect  on  the  basis  of  the  medium's  ability  to  stabilize  a 
solute  with  appreciable  charge  separation.  Due  to  their  nearly  equal  sharing  of  the 
proton  even  in  the  gas  phase,  and  to  their  sensitivity  to  the  medium,  BrH — Am  com¬ 
pletes  require  only  a  small  solute-solvent  coupling  to  be  classified  as  ion-pairs.  The 
Cl  analogs  require  somewhat  larger  coupling,  on  the  order  of  0.001-0.002  au,  to 
reach  the  same  state.  On  the  other  hand,  the  proton  position  in  FH — Am  complexes 
changes  very  little,  even  when  the  dielectric  constant  is  quite  large.  In  all  cases, 
greater  ionic  character  of  the  complex  leads  to  a  contraction  of  the  distance  separating 
the  halide  from  the  amine. 
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Abstract 

The  bimolecular  hydroxyalkylation  of  the  N:.  3-,  O6.  and  7-poutiun  of  guanine  by  protonated  oxirane 
has  been  studied  using  the  mndo  molecular  orbital  procedure.  The  enthalpies  of  activation  (relative  to  the 
isolated  reactants)  were  calculated  to  be  12.5,  11.4,  -4.8.  and  -4.9  kcal  mol  respectively.  The  transi¬ 
tion  state  geometries  were  characteristically  I -like  The  forming  bonds  reach  ca.  0-2**  of  their  final 
strengths  while  cleavage  of  the  breaking  bonds  is  ca.  77-87%  complete.  Their  relative  energies  are  domi¬ 
nated  by  electrostatic  interactions  between  the  reacting  moieties  with  little  to  no  charge  transfer  involved. 
The  relevance  of  these  results  to  the  reactions  of  carcinogenic  oxiranes  and  their  derivatives  with  nucleic 
acids  is  discussed. 


Introduction 

It  is  widely  believed  that  the  carcinogenic  effects  of  certain  chemical  compounds 
are  related  to  their  abilities  to  covalently  modify  the  nucleic  acids  [1],  However,  the 
kinds  and  amounts  of  adducts  formed  in  such  reactions  depend  markedly  on  the  na¬ 
ture  of  the  reactive  electrophile  [2].  An  important  class  of  carcinogens  incorporate  the 
oxirane  ring  system.  Although  the  parent  oxirane  is  a  rather  weak  carcinogen,  some 
of  its  derivatives  are  among  the  most  potent  known  [3],  A  number  of  biologically  sig¬ 
nificant  examples  of  this  kind  arise  through  the  metabolic  oxidation  of  substituted 
alkenes  and  arenes  [4,5].  These  include  the  aflatoxins  [6],  the  polycyclic  aromatic 
hydrocarbons  [5],  and  various  simple  vinylic  compounds  [4], 

For  some  years  we  have  been  using  quantum  mechanical  calculations  in  an  attempt 
to  understand  the  physicochemical  determinants  of  the  complex  regioselectivities  ob¬ 
served  in  the  reactions  of  various  electrophiles  with  nucleic  acids  and  their  compo¬ 
nents  [7-14],  As  part  of  this  effort  we  recently  completed  mndo  semiempirical 
molecular  orbital  calculations  [  15]  on  the  reactions  of  protonated  oxirane  (1)  [11]  and 
several  monosubstituted  derivatives  [13,  14]  with  a  group  of  simple  nucleophiles.  The 
present  paper  extends  this  work  to  the  reactions  of  1  at  four  key  nucleophilic  sites  of 
guanine  (2)  itself: 
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Calculations 

Calculations  were  carried  out  using  the  standard  mndo  procedure  [  15)  with  pro¬ 
grams  either  adapted  from  versions  originating  in  the  laboratories  of  Professor  Dewar 
for  our  Harris  minicomputers,  or  an  IBM  version  of  the  MOP  AC  package  [16].  The 
geometries  of  all  species  were  fully  optimized  with  no  geometrical  constraints.  Ap¬ 
proximate  transition  state  geometries  were  constructed  on  the  basis  of  our  previous 
calculations  [11]  and  refined  by  minimizing  the  square  of  the  gradient  vector  in  the 
usual  way.  Examination  of  the  eigenvector  associated  with  the  single  negative  eigen¬ 
value  of  the  Cartesian  force  constant  matrix  for  each  transition  state  showed  it  to  be  a 
true  saddle  point  for  the  required  process  [  17] . 

Results 

The  calculated  energetic  data  for  the  reactions  leading  to  3-6  are  collected  in 
Table  I.  In  contrast  to  our  earlier  studies  [11, 14]  we  did  not  specifically  optimize  the 
geometries  of  the  ion-dipole  complexes  separating  the  reactants  and  transition  states 
which  characterize  reactions  of  this  kind.  Their  existence  is  of  course  implied  in  the 
two  cases  for  which  the  activation  enthalpies,  A Ht  (calculated  relative  to  the  isolated 
reactants),  were  predicted  to  be  negative.  The  heats  of  reaction  are  given  for  conver¬ 
sion  to  both  the  gauche  and  anti  products.  As  expected  [11, 12)  the  gauche  conforma¬ 
tions  were  predicted  to  be  the  more  stable.  The  alternative  conformations  are 
illustrated  for  6  in  Figure  1 .  The  gauche  conformation  in  which  the  OH  group  lies 
closest  to  the  lowest  numbered  position  of  guanine  is  arbitrarily  designated  gauche  1 
in  Table  I.  The  calculated  transition  state  geometries  are  shown  in  Figure  2  with  key 
structural  and  electronic  data  summarized  in  Table  II.  The  transition  states  for  these 
formally  bimolecular  substitution  reactions  were  calculated  to  be  highly  unsymmetri- 
cal  with  virtually  no  significant  bond  formation  to  the  incoming  nucleophile.  Only  for 
attack  at  the  7-nitrogen  is  any  more  than  an  infinitesimal  degree  of  charge  transfer 
between  the  reacting  moieties  predicted.  According  to  our  previously  suggested  index 
of  bond  formation,  A n%  [12],  even  here  the  forming  bond  reaches  only  ~2%  of  its 
final  strength  (Table  III).  The  transition  state  geometries  and  energies  are  evidently 
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Table  I.  Calculated  energetics  for  the  gas-phase  bimolecular  ring  opening  of  Prolo- 
nated  oxirane  (1)  by  guanine  (2).* 


Alkylation 
site  (product) 

AH/ 

A  Hd 

AH" 

ami 

gauche  I 

gauche  2 

N2-  (3) 

193.2 

12.5 

-32.8 

-38.1 

-37.7 

3-  (4) 

192.1 

11.4 

e 

-53.1 

-51.4 

O6-  (5) 

175.9 

-4.8 

-62.8 

-65.4 

— 

7-  (6) 

175.8 

-4.9 

-66.5 

-69.2 

-70.9 

Nonef 

187.5 

14  5 

‘Energies  in  kcal  mol-1,  distances  in  Angstroms. 

bHeat  of  formation  of  transition  state.  A Hf  (1)  =  173.0  kcal  mol'1;  A Hf  (2)  = 
7.7  kcal  mol'1. 

'Activation  energy  relative  to  isolated  reactants. 

dHeats  of  reaction  to  alternative  product  conformations.  See  text. 

'No  stable  minimum  corresponding  to  this  conformation. 

'Unimolecular  ring  opening. 


gauche  1  anti  gauche  2 

Figure  1 .  Alternative  conformations  of  2-(7-guanyl)-ethanol  (6). 


dominated  by  electrostatic  effects.  Although  the  orientations  of  the  reacting  moieties 
follow  general  expectations  for  bimolecular  displacements,  the  angles  between  the 
entering  and  exiting  atoms  ( 6  in  Table  II)  are  significantly  distorted  from  colinearity. 
This  allows  the  most  favorable  placement  of  the  oxirane  in  the  regions  of  local  elec¬ 
trostatic  attraction.  Thus,  for  attack  at  both  the  O6-  and  7-positions,  the  centroid  of 
charge  of  the  latter  resides  in  the  deep  attractive  well  between  these  two  positions, 
the  individual  geometries  differing  only  in  the  orientation  of  the  exiting  group  [17]. 
In  the  transition  state  for  attack  at  N2-,  the  oxirane  moiety  is  significantly  displaced 
toward  the  electrostatically  attractive  region  above  the  adjacent  3-nitrogen.  A  simple 
estimate  of  the  electrostatic  interaction,  Etl,  between  the  approaching  reactants  was 
made  using  Eq.  1  where  <7,  and  q}  are  the  mndo  charges  in  the  guanine  and  oxirane 
moieties,  respectively,  and  r, ,  the  corresponding  intemuclear  separations.  These  cal¬ 
culations  were  performed  in  two  ways.  Both  were  based  on  the  calculated 
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(a) 


(b) 


(c) 


(d) 


Figure  2.  Calculated  transition  state  geometries  for  attack  of  protonated  oxirane  at  the 
guanine  (a)  N2-;  (b)  3-;  (c)  O6-;  and  (d)  7-positions. 


i 


guanine  oxirane 


Mi 


e*  =  7  2  2  r 

Z  i  i  ri.j 


(1) 


transition  state  geometries  for  attack  at  the  specified  site.  However,  in  the  first  the 
charges  were  calculated  for  each  moiety  unperturbed  by  the  other.  In  the  second,  des¬ 
ignated  Ee,' ,  the  charge  distribution  was  taken  directly  from  the  calculated  transition 
state  itself.  Both  reveal  trends  in  qualitative  accord  with  the  published  electrostatic 
potential  maps  of  guanine  [18]  while  the  difference  between  them,  &Erl,  provides  a 
quantitative  measure  of  the  additional  attractive  interaction  arising  from  their  mutual 
polarization.  The  latter  is  evidently  (Table  III)  greatest  for  electrophiles  approaching 
in  the  vicinity  of  the  O6-  and  7 -positions  and  least  for  approach  to  the  amino  group. 
From  the  calculated  charge  distribution  the  high  polarizability  at  the  former  positions 
is  associated  with  increased  weights  of  dipolar  resonance  forms  akin  to  7  and  8. 


7  « 
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Table  II.  Calculated  transition  state  properties  for  the  gas-phase  bimolecular  hydroxy- 
alkylation  of  2  by  protonated  oxirane.* 


Alkylation 

sik 

n 

ri 

e 

ivb 

&KCH,)' 
x  103 

V 

x  103 

N2- 

3.34 

1.96 

154 

84 

409 

308 

1 

3- 

2.97 

1.93 

148 

83 

439 

303 

3 

o‘- 

2.87 

1.90 

139 

81 

447 

311 

4 

7- 

2.48 

1.84 

152 

78 

538 

257 

38 

None' 

— 

1.99 

— 

86 

386 

307 

— 

'Bond  lengths  in  Angstroms  and  bond  angles  in  degrees.  Definition  of  r,,  r2,  6,  and  d>: 


H 

f 


H 

H 


bFrequency  of  imaginary  vibration  interconverting  reactants  and  products  (cm*1). 
‘Charge  at  the  reacting  CH3  group  in  the  transition  state  relative  to  1  (0.409). 
‘‘Total  charge  associated  with  guanine  moiety. 

'Unimolecular  ring  opening. 


Table  III.  Extents  of  bond  formation  and  cleavage,'  electrostatic  interactions,b  and  deformation  energies' 

in  the  calculated  transition  states. 


Alkylation 

site 

Ani% 

An  2% 

6E„ 

8AH,(1) 

SAtf/2) 

N2- 

0.1 

86.5 

-1.2 

-5.1 

-3.9 

14.5 

1.7 

3- 

0.4 

84.3 

-3.2 

-9.3 

-6.1 

14.3 

3.0 

O6- 

0.4 

82.7 

-16.1 

-25.4 

-9.3 

14.1 

0.8 

7- 

2.3 

76.7 

-15.3 

-23.4 

-8.1 

13.2 

0.8 

Noned 

— 

89.2 

— 

— 

— 

— 

— 

'Orders  of  the  forming  (r,)  and  breaking  (r2)  bonds  as  a  percentage  of  the  net  change  for  the  reaction 
(c.f.  Ref.  12):  An%  =  100(exp(-r‘/0.26)  -  exp(-r70.26)]/[exp(-r70.26)  -  exp(~r70.26] 
b£,i  electrostatic  interaction  between  guanine  and  oxirane  moieties  in  the  transition  states.  C.f.  Eq.  (1) 
and  text.  8£(,  =  E'tl  -  Erl 

c8Afy(l),  6AW,(2):  heats  of  formation  of  the  protonated  oxirane  and  guanine  moieties  in  the  corre¬ 
sponding  transition  states,  relative  to  those  of  the  equilibrium  geometries  of  the  isolated  molecules. 
dUnimolecular  ring  opening. 
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Also  included  in  Table  Ill  are  the  amounts  by  which  the  heats  of  formation  of  the 
individual  reacting  moieties  in  their  transition  state  geometries  differ  from  those  of 
die  isolated  molecules.  These  again  are  indicative  of  a  situation  in  which  the  oxirane 
ring,  although  residing  in  the  electrostatically  attractive  regions  of  the  guanine,  opens 
with  little  nucleophilic  assistance.  In  reaching  the  transition  state  geometry,  the  gua¬ 
nine  moiety  undergoes  distortions  equivalent  to  only  1-3  kcal  mol'1  while  the  en¬ 
ergetic  requirements  of  the  oxirane  moieties  are  very  similar  to  the  activation  energy 
for  unimolecular  ring  opening. 


Discussion 

The  very  early,  or  5m  1 -like  character  in  the  transition  states  for  the  gas-phase  bi- 
molecular  ring  opening  of  protonated  oxirane  (1)  was  evident  in  our  earlier  calcula¬ 
tions  for  simple  nucleophiles  [11],  A  parallel  situation  for  related  reactions  in 
aqueous  solution  had  been  suggested  some  time  earlier  by  Parker  and  Isaacs  who  des¬ 
ignated  them  “borderline”  SN  2  processes  [19).  While  calculations  for  the  gas-phase 
reactions  should  be  of  significant  value  in  interpreting  those  in  aqueous  solution,  a 
quantitative  extrapolation  between  the  two  phases  naturally  involves  uncertainties. 
We  can  be  reasonably  confident  that  the  unimolecular  ring  opening  of  2  will  be  less 
facile  in  aqueous  solution  than  in  the  gas  phase.  Thus,  the  specific  intermolecular  hy¬ 
drogen  bond  which  contributes  ca.  18  kcal  mol-1  to  the  hydration-free  energies  of 
protonated  ethers  [9]  will  be  significantly  weakened  as  cleavage  of  the  CO  bond  be¬ 
gins  to  dissipate  the  positive  charge  formally  associated  with  the  onium  group.  Using 
ideas  deduced  from  our  earlier  work  on  the  alkanediazonium  ions  [8]  raising  the  en- 
dothermicity  of  the  unimolecular  reaction  implies  an  increased  role  of  covalent  inter¬ 
actions  in  the  transition  state  for  the  corresponding  bimoiecular  process.  This  in  turn 
is  expected  to  facilitate  attack  at  the  guanine  7-  relative  to  the  Opposition.  The  activa¬ 
tion  enthalpies  for  attack  of  protonated  oxirane  at  these  positions  in  the  gas  phase  are 
predicted  to  be  almost  identical.  While  if  the  foregoing  arguments  are  correct,  hydra¬ 
tion  effects  should  tip  this  balance  in  favor  of  7-alkylation.  Indeed  only  7-alkylated 
products  have  been  reported  for  the  reactions  of  oxirane  and  its  simple  alkyl  deriva¬ 
tives  with  deoxyguanosine  in  aqueous  solution  [20,21].  Thus,  the  apparent  absence 
of  06-alkylation  may  well  be  explicable  in  these  terms.  Guanine  3-alkylation  has  not 
been  repotted  for  simple  epoxides  although  minor  adducts  of  other  alkylating  agents 
at  this  site  have  been  detected  [1],  The  least  favorable  alkylation  site  is  predicted  to 
be  the  exocyclic  amino  group,  N1.  Again,  alkylation  by  simple  epoxides  has  not  been 
reported  at  this  site  and,  only  with  very  rare  exceptions  is  it  a  target  for  related  alky¬ 
lating  agents  [1],  Interestingly,  however,  styrene  oxide  reacts  with  deoxyguanosine  in 
aqueous  solution  to  give  ca.  12%  and  28%  of  the  corresponding  O6  and  N2  adducts  9 
and  10,  in  addition  to  products  resulting  from  attack  at  the  7-position  [22].  Moreover, 
for  a  number  of  aryl  epoxides  derived  from  polycyclic  aromatic  hydrocarbons,  the 
corresponding  N 2  adducts  are  the  major  reaction  products  [5].  We  have  previously 
suggested  that  in  aqueous  solution,  adduct  formation  at  the  Apposition  should  be 
more  favorable  than  implied  by  the  gas  phase  calculations  due  to  specific  hydrogen 
bonding  to  the  polar  N+-H  bonds  of  the  nitrogen  undergoing  quatemization  [9], 
While  not  manifest  in  the  reactions  of  simple  alkylating  agents,  this  effect  could  be- 


REACTIONS  OF  PRONOTED  OXIRANE 


63 


come  significant  for  the  more  reactive  aryl  epoxides,  where  the  intrinsic  differences 
between  the  activation  energies  at  alternative  sites  are  less.  We  have  yet  to  scrutinize 
the  reactions  of  aryl  epoxides  theoretically.  However,  on  the  basis  of  the  present  re¬ 
sults  we  are  tempted  to  suggest  an  additional  factor  which  is  expected  to  reduce  the 
difference  between  the  activation  barriers  for  guanine  7  and  N2  attack  in  die  case  of 
aryl  epoxides.  As  noted  in  the  previous  section,  the  greater  electrostatic  attraction  ex¬ 
perienced  by  electrophiles  approaching  the  former  site  is  due,  in  part,  to  the  greater 
polarizability  of  the  guanine  undergoing  this  mode  of  attack.  This  effect  should,  in 
turn,  be  most  pronounced  in  transition  states  involving  the  aliphatic  epoxides  where 
the  positive  charge  is  highly  localized  on  the  reacting  methylene  group.  For  aryl 
epoxides,  where  the  corresponding  position  is  a  benzylic  carbon,  die  charge  will  be 
significantly  delocalized  and  polarization  effects  therefore  less  important.  An  effect 
of  this  kind  is  probably  responsible  for  the  clear  shift  from  7-  to  N2-benzylation  of 
deoxyguanosine  observed  by  Dipple  and  co-workers  [23,24]  for  a  series  of  benzyl 
chlorides  (lla-e)  bearing  increasingly  electron-donating  substituents. 


Ph 

\ 


it 


Acknowledgment 

We  thank  the  National  Institutes  of  Health  for  financial  support  for  this  research 
through  Grant  No.  CA  3873,  and  the  SMU  Computer  Center  for  a  generous  alloca¬ 
tion  of  computer  time.  The  H800  was  a  gift  from  the  Harris  Corporation  to  the  SMU 
Department  of  Civil  and  Mechanical  Engineering. 


Bibliography 

[1]  B.  Singer  and  D.  Gninberger,  Molecular  Biology  of  Mutagens  and  Carcinogens.  (Plenum,  New 
York),  1983. 

[2}  K.  Hemminki,  Arch.  Toxicol.  52,  249  (1983). 

(3)  l ABC  Monographs  on  the  Evaluation  of  the  Carcinogenic  Risk  of  Chemicals  to  Humans.  Allyl  Com¬ 
pounds,  Aldehydes,  Epoxides,  and  Peroxides,  vol.  36.  (International  Agency  for  Research  on  Cancer, 
Lyon,  1985),  p.  189.  H.  Vaino,  K.  Hemminki,  and  J.  Wilboum,  Carcinogenesis  6,  1653  (1985). 

(4)  L.  Ehrenberg  and  S.  Hussain,  Mutat.  Res.  86,  1  (1981). 

(5)  A.  Dipple,  R. C.  Moschel,  and  A.H.  Bigger,  Chemical  Carcinogens,  ACS  Monogr.  182,  vol.  I, 
C.E.  Searle,  Ed.  (American  Chemical  Society,  Washington,  D.C.,  1984),  pp.  41-174. 


64 


FORD  AND  SMITH 


16]  W.  F.  Busby,  Jr.,  and  G.  N.  Wogan,  Chemical  Carcinogens,  ACS  Monogr.  182,  vol.  2,  C.  E.  Searle, 
Ed.  (American  Chemical  Society,  Washington,  D.C.,  1984),  pp.  945-1136. 

[7]  G.P.  Fori  and  J.D.  Scribner,  J.  Am.  Chem.  Soc.  M3,  4281  (1981). 

[8]  G.  P.  Fori  and  J.  D.  Scribner,  J.  Am.  Chem.  Soc.  105,  349  (1983). 

[9]  G.  R  Fori  and  J.  D.  Scribner,  J.  Org.  Chem.  48,  2226  (1983). 

[10]  G.  P.  Fori,  J.  Am.  Chem.  Soc.  108,  5104  (1986). 

[11]  G.P.  Fori  and  C.T.  Smith,  Int.  J.  Quantum  Chem.  13,  107  (1986). 

[12]  G.P  Fori  and  C.T.  Smith,  J.  Am.  Chem.  Soc.  109,  1325  (1987). 

[13]  G.P  Fori  and  C.T.  Smith,  J.C.S.  Chem.  Commun.  44  (1987). 

[14]  G.  P  Fori  and  C.  T.  Smith,  J.  Comput.  Chem.  (in  preparation). 

[15]  M.  J.S.  Dewar  and  W.  Thiel,  J.  Am.  Chem.  Soc.  99,  4899,  4907  (1977). 

[16]  S.  Oliveila,  QCPE  Bull.  4,  109  (1984). 

[17]  The  transition  state  geometries  shown  in  Figure  2  represent  those  of  lowest  energy  for  attack  at  each 
site.  We  also  located  a  series  of  slightly  higher  energy  (  0-1  kcal  mol'1)  transition  structures  with  the 
alternative  configuration  about  the  oxirane  oxygen.  Further  transition  states,  related  to  those  shown  in 
Figure  2  by  rotations  of  ca.  90°  about  the  forming  bond,  were  also  located.  The  corresponding  activa¬ 
tion  energies  via  these  were:  3-,  12.0  kcal  mol'1;  O6-,  -4.4  and  -4.7  kcal  mol"1;  7-,  -3.5  kcal 
mol'1.  In  the  interest  of  brevity  the  individual  structures  are  not  discussed  in  detail. 

[18]  R.  Bonaccorsi,  E.  Scrocco,  J.  Tomasi,  and  A.  Pullman,  Theoret.  Chim.  Acta  36,  339  (1975). 

[19]  R.E.  Parker  and  N.S.  Isaacs,  Chem.  Rev.  59,  737  (1959). 

[20]  P  Brooks  and  P.  D.  Lawley,  J.  Chem.  Soc.  3923  (1961). 

[21]  P.  D.  Lawley  and  M.  Jarman,  Biochem.  J.  126,  893  (1972). 

[22]  K.  Hemminki  and  A.  Hesso,  Carcinogenesis  5,  601  (1984);  K.  Hemminki,  7th  Annual  Interdisci¬ 
plinary  Cancer  Research  Workshop,  Univ.  New  Orleans.  Feb.  24,  1984. 

[23]  A.  Dipple,  R.C.  Moschel,  and  W.R.  Huggins,  Dnig  Metab.  Rev.  13,  249  (1982). 

[24]  For  an  alternative  statement  of  somewhat  similar  ideas  see:  R.  C.  Moschel,  W.  R.  Huggins,  and  A. 
Dipple,  J.  Org.  Chem.  44,  3324  (1979). 

Received  April  2,  1987 


Application  of  the  Quantum  Mechanics  and 
Free  Energy  Perturbation  Methods  to 
Study  Molecular  Processes 

PIOTR  CIEPLAK,*  U.  CHANDRA  SINGH, +  AND  PETER  A.  KOLLMAN 

Department  of  Pharmaceutical  Chemistry,  School  of  Pharmacy,  University  of  California,  San  Francisco, 

California  94143,  U.S.A. 


Abstract 


The  molecular  dynamics  free  energy  perturbation  method  was  applied  to  study  the  solvation  effect  on 
the  tautomeric  equilibria  in  water  solution  as  well  as  association  of  the  nucleic  acid  base  pairs  in  water  so¬ 
lution  and  in  vacuo.  Tautomerization  energies  in  vacuo  calculated  by  the  ab  initio  scf-hf  method  differed 
from  experiment  by  1-2  kcal/mol.  even  if  geometry  optimization  was  performed  and  mp2  correlation  en 
ergy  calculated  at  6-31 G*  basis  set  was  added. 


Introduction 

A  new  approach  applying  quantum  mechanics  (qm)  together  with  molecular  dy¬ 
namic  free  energy  perturbation  (fep/md)  [1.2]  methods  has  been  used  to  investigate 
chemical  processes  such  as  tautomeric  equilibria  and  the  association  of  the  nucleic 
acid  bases  in  vacuo  and  in  water  solution. 

The  gas-phase  energy  differences,  were  calculated  for  2-oxo-pyridine  (I).  2-oxo- 
pyrmidine  (II),  and  cytosine  (III)  tautomers  (Fig.  1)  by  the  ab  initio  Hartree-Fock 
method  in  the  extended  6-31G*  basis  set  for  geometries  optimized  at  3-21G  level.  For 
the  test  case  of  2-oxo-pyridine  tautomers,  geometry  optimization  and  the  calculation 
of  the  correlation  energy  by  the  mp2  [3]  method  with  a  6-3 1G*  basis  set  were  also 
performed.  The  fep/md  method  was  applied  to  study  the  hydration  effect  on  tauto¬ 
meric  equilibria.  Combined  results  of  the  qm  and  the  fep/md  method  give  results  in 
good  agreement  with  experiment  [4-7], 

Preliminary  calculations  [8]  using  the  fep/md  method  for  nucleic  acid  base  associ¬ 
ation  in  water  and  in  vacuo  have  been  carried  out.  Five  complexes  were  considered: 
adenine-thymine  Watson-Crick  H-bonded  pair,  adenine-thymine  stacked,  adenine- 
adenine  stacked,  guanine-cytosine  Watson-Crick  H-bonded.  and  guanine-cytosine 
stacked.  The  stacked  complexes  were  calculated  to  be  slightly  more  stable  than  the 
H-bonded  in  water,  whereas  in  vacuo  the  H-bonded  complexes  are  favored,  which  is 
consistent  with  experiment  [9], 


‘Permanent  addresses:  Quantum  Chemistry  Laboratory.  Department  of  Chemistry.  University  of  Warsaw. 
Pasteura  1,02-093  Warsaw,  Poland. 

tResearch  Institute  of  Scripps  Clinic.  Department  of  Molecular  Biology.  La  Jolla.  CA  92037. 


INTERNATIONAL  JOURNAL  OF  QUANTUM  CHEMISTRY  QUANTUM  BIOLOGY  SYMPOSIUM  14.  065-074  (1987) 

©  1987  by  John  Wiley  &  Sons,  Inc.  CCC  0020-7608/86/010065- I0S04.00 


66 


CIEPLAK,  SINGH,  AND  KOLLMAN 


Figure  1.  Tautomeric  structures,  atom  nomenclature,  and  numbering  scheme  for  2-oxo- 
pyrydine  (I),  2-oxo-pyrimidine  (II),  and  cytosine  (III). 


This  fep/md  method  seems  to  be  general  and  is  a  powerful  method  to  obtain  equi¬ 
librium  constants  and  Gibbs  free  energy  differences  of  solvation  from  simulations. 

Free  Energy  Perturbation  Method  Formalism 

Several  statistical  mechanical  procedures  have  been  developed  to  compute  free  en¬ 
ergy  differences  for  solutions  [10-12],  The  recent  applications  [1, 12-19]  of  the  ther¬ 
modynamic  free  energy  perturbation  method  confirm  its  general  applicability  in 
Monte  Carlo  (MC)  as  well  as  molecular  dynamics  (MD)  simulations.  It  is  also  a  robust 
tool  for  calculation  of  the  relative  free  energy  differences  in  the  general  case,  not 
only  when  solvation  processes  are  of  interest.  Here  we  summarize  the  theoretical  ba¬ 
sis  of  the  original  method  given  by  Zwanzig  [20]. 
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In  a  canonical  ensemble  (e.g.,  generated  by  standard  mc  or  md  with  constant  tem¬ 
perature)  the  Helmholtz  free  energy  is: 


F  =  -kT  In  Z 

where  Z  is  the  partition  function  determined  by  the  Hamiltonian  H(  p,  q) 
Z 


(1) 


(2) 


The  aim  is  to  calculate  free  energy  difference  &F  between  systems  (or  states)  A  and 
B  whose  Hamiltonians  differ  by  perturbation  AH: 


Hb 


Ha  +  AH 


this  yields: 


A F  =  Fo-  Fa  =  -kT  In  -r  =  -kT  In 


|  exp 

A-H  *  +  AH)'] 
A  kT  ) 

\dpdq 

} 

exp| 

u:-j 

)dpdq 

(3) 


(4) 


and: 


A F  =  -kT  In 


\  exp( 

V  kT  , 

|exp( 

-AH^j 
kT  ) 

\dpdq 

/h 

f-H/ 
V  kT  , 

^jdpdq 

(5) 


One  should  keep  in  mind  that,  in  general,  this  last  equation  can  be  obtained  from 
Eq.  (4)  if  and  only  if  HA  and  AH  commute,  which  is  true  in  the  case  when  the  clas¬ 
sical  expression  for  the  Hamiltonian  of  the  system  under  consideration  is  used.  In  this 
case  we  can  rewrite  Eq.  (5)  as  follows: 


AF  =  -kT  In  (exp 


-AH 

kT 


(6) 


where  ()A  means  average  over  the  system  A.  In  the  P,  T  ensemble  Gibbs  free  energy 
AG  is  obtained.  Expression  (6)  can  be  used  immediately  in  Monte  Carlo  and  molecu¬ 
lar  dynamics  simulations  to  calculate  A F. 

Direct  calculation  of  the  expectation  value  in  equation  (6)  via  mc  or  md  encounters 
convergence  problems  if  AH  is  large.  To  overcome  this  problem  the  umbrella  sam¬ 
pling  method  developed  by  Torrie  and  Valleau  [21]  can  be  used.  Alternatively,  one 
can  combine  many  small  perturbations  coupled  to  a  dimensionless  parameter  X  along 
the  path  between  A  and  B  and  sum  up  free  energy  differences  obtained  for  each  of 
these  small  perturbation  steps.  This  is  the  basis  for  the  windowing  procedure,  that  is: 

'-AH(A  -»  X') 


AC,  -  -kT  In  (*»p(  *  *))( 


(7) 


AG 


I 

=  ZG, 

A=0 


(8) 


X?.- 
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For  each  window  separate  md  (or  mc)  simulations  with  equilibration  and  data  collec¬ 
tion  stages  are  performed.  If  infinitesimal  steps  in  >.  are  chosen,  the  ( )*  does  not  fluc¬ 
tuate  (i.e.,  the  system  can  be  regarded  to  be  in  equilibrium)  this  yields: 

AG,  =  (Hv  -  HA  •  (9) 

This  expression  is  the  essence  of  the  so  called  slow  growth  procedure  in  which  one 
needs  only  to  define  the  total  time  (in  the  case  of  md)  or  total  number  of  configu¬ 
ration  to  be  generated  (in  the  case  of  mc)  during  which  conversion  from  state  A  to 
state  B  proceeds. 


f 


t 

i 

S 


Methods 


In  our  calculations,  Gaussian-80-UCSF  [22]  and  Gaussian-82  [23]  programs  were 
used  for  the  ab  initio  calculations.  Partial  atomic  charges  used  in  md  simulation  were 
calculated  by  fitting  to  the  electrostatic  potential  obtained  from  ab  initio  6-3 1G*  [24] 
wave  functions  [25]  for  isolated  molecules. 

The  molecular  simulation  program  amber-ucsf  (Version  3.0)  [2]  with  its  force 
field  was  used  to  calculate  free  energy  differences  by  the  fep/md  method.  Molecular 
dynamic  simulations  were  carried  out  at  T  =  300  K.  In  the  solution  calculations  the 
tip3P  [26]  water  model  was  used  with  1  atm  constant  pressure  and  periodic  boundary 
conditions.  Complete  computational  details  are  given  in  Refs.  8  and  19. 

The  free  energy  perturbation  method  was  incorporated  into  the  molecular  dynamics 
module  of  the  amber  program  in  the  following  way  [1,2].  The  molecular  mechanical 
energy  is  calculated  according  to  the  following  formula: 


--Total  -  2  Kr(r  -  rj  +  £  W  -  ej  +  X  7 

bonds  angles  dihedrals  ^ 


v. 


f[l  +  cos (n<b  -  -y)] 


+  2 

><j 


Ag  By  |  q.q. 

+  2 

Cjj  D,j 

R)f  R*  cR„ 

H-bonds 

n  12  n  10 

_  ij  ij 

(10) 


where  the  first  three  terms  represent  the  difference  in  energy  between  a  geometry  in 
which  the  bond  lengths,  bond  angles,  and  dihedral  angles  have  ideal  values  and  the 
actual  geometry.  The  remaining  terms  represent  nonbonded-van  der  Waals  and  elec¬ 
trostatic  interactions.  The  last  term  (10-12)  is  used  for  atoms  involved  in  hydrogen 
bonding.  To  represent  changes  or  “mutations”  of  a  given  group  of  atoms  from  state  A 
to  state  B  coupled  to  the  dimensionless  parameter  A  the  appropriate  values  of  Kr,  req, 
K„,  8^,  Ed ihedral,  q„  Ay,  Bir  C,r  and  D„  for  a  given  A  were  calculated  according  to  the 
linear  interpolation  prescription: 


X(\)  =  kXA  +  (1  -  A)X9 


(11) 


Since  the  Helmholz  free  energy  is  a  state  function,  the  computed  free  energy  changes 
are  path  independent  (i.e.,  should  not  depend  on  the  manner  in  which  the  “mutation” 
from  state  A  to  B  is  performed).  We  also  assume  that  in  our  calculation  the  changes 
in  AH  are  mainly  due  to  the  potential  energy  and  contributions  from  kinetic  energy 
change  cancel  out  and/or  are  negligible. 
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The  above  procedure  was  tested  extensively  for  several  cases  where  accurate  ex¬ 
perimental  data  are  available.  For  example  the  calculated  [1,  13]  AA G  hydration 
between  CH,OH  and  CH3CH3  is  equal  to  6. 6-6. 9  kcal/mol,  whereas  the  experi¬ 
mental  value  is  6.9  kcal/mol.  In  the  case  of  AAG  for  the  reaction  NH/(H,0),  — * 
H,0f(H30)3  20.0  kcal/mol  was  obtained  [1]  versus  21.3  kcal/mol  for  the  experi¬ 
mental  value.  If  the  above  reaction  was  performed  during  md  simulation  in  water 
bath  the  value  of  21.1  kcal/mol  was  calculated  ]1]  which  is  also  very  close  to  the 
experimental  value  of  ~  20  kcal/mol.  The  above  scheme  was  also  applied  to  study 
the  solvation  of  amino  acids  and  nucleic  acid  bases  (28]  as  well  as  to  calculate  ener¬ 
getics  of  protein  inhibitor  binding  [17]  and  site-specific  mutagenesis  phenomena  (29). 
In  this  paper  we  present  some  results  obtained  by  the  fep/md  method  applied  to  study 
tautomerism  and  association  phenomena  in  water  solutions. 

In  our  specific  examples,  the  transformation  of  one  tautomer  into  another  in  water 
solution  was  carried  out  by  changing  the  parameters  in  such  a  way  to  achieve  the 
simultaneous  vanishing  tautomeric  proton  in  one  position  (e.g..  oxygen)  and  growing 
this  proton  in  another  position  (e.g.,  nitrogen). 

To  model  association  (or  rather  dissociation)  process  for  the  system  A-B  in  vacuo, 
the  charges,  and  the  van  der  Waals  parameters  for  one  of  the  molecules  were  de¬ 
creased  to  zero  during  the  MD  simulation.  To  model  the  solvation  of  A  and  B  or  the 
A-B  complex,  a  simulation  was  done  on  the  system  with  the  solute  fully  represented 
and  then  its  electrostatic  and  van  der  Waals  parameters  decreased  to  zero.  The  results 
were  evaluated  with  the  thermodynamic  cycle  in  Figure  2. 

Tautomerism 

In  Table  I  the  tautomerization  energies  are  summarized.  Zero-point  energy  differ¬ 
ences  Ae„  between  tautomers  were  estimated  by  the  mindo/3  method  and  since  they 
are  in  the  range  0.6-0.9  kcal/mol  they  cannot  be  neglected  because  correction  of  this 
magnitude  significantly  influences  the  tautomerization  energy.  We  have  neglected  the 
differences  which  arise  from  temperature  dependence  of  vibrational  and  rotational  en¬ 
ergies  and  entropies  because  at  the  mindo/3  level  these  are  less  than  0.1  kcal/mol 
and  0.2  kcal/mol,  respectively. 

Our  ab  initio  calculations  are  extensions  of  the  work  done  by  Scanlan  and  Hillior 
[29]  for  cytosine  tautomerism  and  Schlegel  [30]  on  the  2-oxopyridone/2-hydrox- 
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Figure  2.  Thermodynamic  cycle  used  to  calculate  of  nucleic  base  pairs  in  water. 
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Table  I.  Tautomerization  energies  A£  (kcal/mol)  (defined  as  a  difference  between  appropriate  total 
energies  obtained  from  ab  initio  calculations)  for  2-oxo-pyridine  (I).  2-oxopyrimidine  (11)  and  cytosine 
(III)  isomers  in  the  gas  phase  calculated  at  different  levels. 


AE  = 

A£  = 

A  £  = 

Af  = 

£(lb)  -  £(Ia) 

£(Ic)  -  £(la) 

£(Ilb)  -  £(Ila) 

E(llla)  -  £(HIb) 

a)  HF/3-21G 
geometry  optimization 

b)  HF/6-31G*  geometry 

1.67 

17.19 

1.14 

-0.39 

optimized  in  3-2 1G 
c)  hf/6-31G* 

0.55 

6.69 

-1.85 

-0.61 

geometry  optimization 
d)  MP2/6-31G*  geometry 

0.10 

6.14 

optimized  in  HF/6-31G* 

-1.61 

4.40 

Zero  point  vibration 

energy  differences 
(from  mindo/3) 

-0.73 

-0.85 

-0.63 

0.58 

Estimates*  of  b) 

-0.18 

5.84 

-2.48 

-0.03 

tautomerization  c) 

-0.63 

5.29 

energy  d) 

-2.34 

3.55 

Experimental  AAG 

-0.4  ±  0.7  (IR)-* 

-0.5  ±  0.8  <UV)* 

-0.6  ±  0.1  (X  -  PES)5 

-2.4* 

'Values  given  under  labels  b.c.d  refer  to  the  calculations  done,  b)  with  HF/6-3IG*  and  geometry  opti¬ 
mized  in  3-2IG  basis  set,  c)  HF/6-3IG*  geometry  optimization,  and  d)  MP2/6-3IG*  geometry  optimized 
in  6-31G*  basis  set,  respectively. 


ypyridine  equilibrium.  It  was  also  found  that  to  obtain  proper  results  for  tautomeriza¬ 
tion  energies  in  the  gas  phase,  the  proper  choice  for  geometries  of  the  tautomers,  and 
method  for  the  energy  calculation  are  needed. 

Table  I  shows  that  enhancing  the  3-21G  basis  set  to  6-31C*  improves  agreement 
with  experiment  for  the  2-oxo-pyridine  tautomerism.  Further  geometry  optimization 
at  the  6-3 1G*  basis  set  moves  the  position  of  tautomerization  energy  in  the  right 
direction  and  closer  to  the  observed  value,  but  still  the  keto  form  is  more  stable. 
Addition  of  the  correlation  energy  calculated  within  the  6-31G*/mp2  level  stabilizes 
the  hydroxy  form  by  1.7  kcal/mol.  This  overestimate  is  opposite  in  direction  to  that 
obtained  by  Schlegel  et  al.  [30]  at  the  6-3IG/mp2  level,  where  the  keto  form  was 
stabilized  by  0.8  kcal/mol. 

The  overall  agreement  of  the  calculated  data  for  the  gas  phase  tautomeric  equilibria 
for  2-oxo-pyrimidine  is  satisfactory,  since  the  estimated  error  for  the  experimental  re¬ 
sults  is  not  known. 

Experimental  data  for  the  gas-phase  tautomerization  energy  of  cytosine  is  not 
known  and  our  calculations  suggest  that  significant  amounts  of  both  forms  should  be 
observable  in  the  gas  phase. 

In  Table  II  we  summarize  the  results  for  the  free  energies  and  equilibrium  con¬ 
stants  obtained  here  and  in  experiments.  The  AAGtol  (column  5),  which  is  the  free 
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Table  II.  Comparison  of  the  free  energy  differences  (kcaJ/mol)  and  equilibrium  constants  for  tautomer- 
ization  in  the  H;0  solution  at  300  K  obtained  from  md  simulation  and  experiment. 


n 

Tautomer 

pair 

nh2o 

in  md 
simulation 

aag„„ 

aag„ 

Kc* 

aag„p 

2-oxopyridine 

lb  —  la 

lb  —  la 

la/Ib 

O 

+1 

v> 

1 

b)  -  5.2  ±  0.4 

6,390 

Ia/Ib 

i 

lb  p!  Ia 

600 

c)  -  4.7  ±  0.4 

2,980 

-4.14 

910 

d)  -  3.0  ±  0.4 

164 

Ic  —  lb 

Ic  —  lb 

Ib/lc 

0.9  ±  0.1 

b)  -  5.0  ±  0.1 

4.200 

2 

Ic  lb 

584 

c)  -  5.0  ±  0.1 

4,200 

d)  -  4.9  ±  0.1 

3,500 

Ic  —  la 

Ic  —  la 

Ia/Ic 

-4.4  ±  0.1 

b)  -  10.2  ±  0.1 

3 

Ic  *2  Ia 

584 

c)  -  9.7  ±  0.1 

d)  -  7.9  ±  0.1 

711,000 

Ic  —  dla 

Ic  —  la 

Ia/Ic 

based  on 

-4.4  ±  0.4 

-10.2  ±  0.4 

3' 

Ic  Ia 

1  and  2 

c)  —  9.7  ±  0.4 

d)  -  7.9  ±  0.4 

711,000 

2-oxopyrimidine 

lib  —  Ila 

-5.5  ±  0.4 

lib  —  Ila 

Ila/IIb 

Ila/IIb 

4 

lib  «=*  Ila 

578 

-2.9  ±  0.2 

154 

<-1.64 

>15 

(ethanol) 

cytosine 

Illa/IIIb 

IHb  —  Ilia 

IHb  —  Ilia 

Illa/IIIb 

-6.3ft 

39,810 

5 

IHb  IHa 

588 

-4.2  ±  0.2 

r-J 

o 

+1 

<N 

1 

1235 

-4.r 

1.000 

Different  level  of  calculations  -  b,  c,d  as  in  Table  I.  AAG,m  is  defined  by  the  equation  12.  AAG*,K  is  the 
value  calculated  by  Eq.  7,  K  is  experimental  or  calculated  from  AAG  equilibrium  constant. 


energy  difference  between  tautomers  in  solution,  is  estimated  according  to  the  fol¬ 
lowing  equation: 

AA(7(ot  =  tv  +  Ae0  +  A£qM  +  A5qM.  (12) 

where  AA Gxh  is  the  solvation  free  energy  difference  •'r'ween  the  tautomers  calcu¬ 
lated  by  the  fep/md  simulations,  Ae0  is  the  difference  in  zero-point  vibrational  ener¬ 
gies  between  the  two  tautomers  (from  minoo/3),  A£qm  is  the  calculated  gas  phase 
tautomerization  energy  and  A SQM  is  the  difference  in  the  entropies  which  could  be 
calculated  by  quantum  mechanical  methods,  but  which  we  assume  to  be  negligible, 
given  the  mindo/3  results  noted  above.  In  general,  there  is  qualitative  agreement  in 
all  cases  except  for  the  cytosine  tautomers,  where  the  experimental  results  are  rather 
inconclusive  and  depend  on  the  measurement  method  [6.7],  One  general  conclusion 
that  can  be  derived  from  our  calculations  is  that  the  keto  and  amino  forms  of  these 
molecules  prevail  in  water  solutions  mainly  because  of  the  solvation  effect. 
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The  robustness  of  our  method  is  supported  by  the  fact  that  the  sum  of  the  free  ener¬ 
gies  for  the  three  simulations  involving  the  cyclic  transformations  la  — *  lb,  lb  — *  lc, 
Ic  — »  la,  which  should  be  exactly  0  kcal/mol  is  indeed  0.0  ±  0.4  kcal/moi.  This 
accuracy  is  achieved  despite  a  difference  in  the  number  of  water  molecules  in  the 
simulations. 

The  effect  of  solvation  on  the  isomerization  of  the  lb  to  lc  isomer  also  shows  a 
competition  between  intrinsic  energies  and  solvation  effects.  Table  11,  simulation 
no.  2  shows  that  the  solvation  free  energy  difference  and  the  zero-point  vibration  en¬ 
ergy  contribution  favors  lc  (angle  N-C-O-H  =  180°),  but  the  internal  electronic  en¬ 
ergy  is  lower  by  6.0  kcal/mol  for  the  lb  isomer,  thus,  in  net,  the  lb  form  is  favored. 

In  the  case  of  the  equilibrium  Ha  *=*  lib,  only  the  data  measured  in  ethanol  solu¬ 
tion  are  available.  However,  changing  solvents  from  ethanol  to  water  the  keto  form, 
with  its  higher  dipole  moment,  will  be  preferentially  stabilized.  This  will  make  the 
difference  in  free  energy  more  negative  than  —1.6  kcal/mol  and,  thus,  closer  to  our 
calculated  value. 

After  our  calculations  were  completed,  we  have  learned  that  Kwiatkowski  et  al. 
(31.32j  have  carried  out  ab  initio  calculations  of  the  relative  gas  phase  energies  for 
the  nucleic  acid  bases  at  a  similar  level  as  our  calculations  for  2-oxo-pyridine.  They 
found  that  the  mbpt(2)  correlation  correction  calculated  at  the  6-3 IC*  basis  set  level 
stabilizes  the  amino-keto  (Ilia)  isomer  over  keto-imino  (Illb)  cytosine  by  0.7  kcal/ 
mol.  If  this  correction  were  added  to  our  estimate  of  the  Illa-IIIb  tautomerization 
energy  in  solution,  it  would  become  -2.3  rather  than  -1.6  kcal/mol  and  yield  a 
result  closer  to  the  experimental  data. 


Association  of  Base  Pairs 

In  Table  III  the  association  free  energies  for  base  pairs  obtained  by  the  fep/md 
simulations  are  collected  and  compared  with  the  available  experimental  data  [9],  The 
AGaS50C(aql  was  calculated  from  the  thermodynamic  cycle  presented  in  Figure  2  by 
subtracting  AGSO|V(AI  and  AGwlvlBl  (taken  from  Ref.  27)  from  the  sum  of  the  A Gassoclgl 
and  AGSO|V(A_Bi.  Each  of  those  numbers  were  obtained  from  numerical  fep/md  simu¬ 
lations,  thus  they  should  be  regarded  as  “first  principle"  results. 


Table  III.  Association  free  energy  (kcal/mol)  for  nucleic  acid  base  pairs  in  H-bonded  and  stacking  con¬ 
figurations  in  the  gas  phase  and  water  solution  in  T  =  300  K. 


A  -  T  H-bond 

A/T  stack. 

G  -  C  H-bond 

G/C  stack. 

A/ A  stack. 

Calculated 

^^assoc(g) 

Calculated 

-1.51  ±  0.23 

0.38  ±  0.15 

-6.78  ±  0.16 

-2.61  ±  0.05 

1.20  ±  0.12 

A^assoc(aq) 

0.18 

—  1.86 

-0.91 

-2.22 

-2.16 

Experimental 

A^assoc(aq) 

— 

-1.15 

— 

-0.71 

-1.80 

Experimental  data  were  taken  from  Ref.  9. 
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The  numerical  results  for  AGa„ol.,fl  presented  in  Table  III  were  corrected  for 
changes  in  vibrational  entropies  during  complex  formation  in  T  =  300K  and  for  ex¬ 
pressing  AC  at  different  standard  state  (i.e.,  1  atm  versus  1  Molar  state,  to  which  ex¬ 
perimental  data  are  referred)  to  be  compared  with  experimental  values. 

fep/md  results  show  that  the  method  is  able  to  properly  predict  that  H-bonded  con¬ 
figurations  are  preferred  in  the  gas  phase  for  adenine-thymine  and  guanine-cytosine 
pairs  as  well  as  the  preference  in  water  solution  for  the  stacked  configurations.  This 
is  because  in  solution,  water  can  form  H  bonds  with  the  bases  in  the  stacked  configu¬ 
ration  that  are  comparable  in  itrenglh  to  base-base  hydrogen  bonds. 

For  the  A Glssic(aql  no  errors  bars  are  given  in  Table  III,  but  we  expect  these  to  be 
of  the  order  of  2-3  kcal/mol.  Despite  these  large  errors  our  simulations  give  reason¬ 
able  values  for  the  absolute  AG  of  solution  associations  which  are  in  the  range  of 
experiment. 


Conclusions 

The  fep/md  method  seems  to  be  general  and  is  found  to  be  the  only  acceptable 
method  to  obtain  equilibrium  constants  and  free  energy  differences  from  simulations. 
This  method  can  be  regarded  also  as  a  supportive  and  a  predictive  tool  for  the  exper¬ 
imentalist,  especially  in  the  field  of  site-specific  mutagenesis.  drug-DNA  specific  in¬ 
teractions,  and  solvation  studies.  Our  md  simulation  results  for  the  calculation  of  the 
solvation  free  energy  differences  between  two  isomers  give  a  rather  good  agreement 
with  experiment  and  suggests  that  they  can  be  used  to  predict  equilibrium  constants 
in  other  cases  where  no  experimental  data  is  available. 
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Abstract 

In  heme  proteins,  the  axial  ligands  bound  directly  to  the  iron  are  important  modulators  of  biological 
function.  A  common  spectroscopic  technique  used  to  detect  the  presence  of  heme  units  with  oxygen- 
containing  ligands,  is  the  broadening  of  the  electron  spin  resonance  (esr)  spectrum  by  hyperfine  interac¬ 
tions  with  unpaired  spin  density  in  '’O-enriched  systems.  To  be  useful  as  a  means  of  identifying  such 
ligands,  there  must  be  a  measurable  level  of  unpaired  spin  density  on  the  oxygen  ligand.  In  this  study,  we 
have  used  the  semiempirical  indo/rohf  quantum  mechanical  method  to  calculate  and  compare  the  spin 
density  localized  on  the  axial  oxygen  ligand  in  the  active  site  of  four  model  heme  proteins.  Metmyoglobin. 
cytochrome  c  peroxidase  (CCP),  P450cam,  and  catalase.  In  particular,  we  have  attempted  to  determine  for 
which  systems  the  results  of  such  an  experiment  would  be  a  reliable  indicator  of  the  presence  of  water  or 
other  types  of  oxygen-containing  ligands.  Using  the  MetMb  system,  for  which  such  broadening  has  been 
observed,  to  determine  a  threshold  value  of  spin  density  on  the  oxygen  atom  needed  to  detect  broadening 
of  the  esr  spectra,  we  have  found  one-hundredth  less  spin  on  the  water  ligand  in  P450cam.  thus  explaining 
the  observed  lack  of  broadening  in  the  esr  spectra  of  the  low  spin  resting  state.  In  addition,  we  predict  that 
the  catalase  system  would,  in  principle,  exhibit  l70  broadening  of  its  esr  spectra  but  that  CCP  would  not. 
Finally,  given  the  similarity  of  CCP  and  HRP  (horse  radish  peroxidase),  our  calculations  suggest  that  the 
absence  of  broadening  in  the  esr  spectra  of  HRP  does  not  rule  out  the  presence  of  water  as  a  sixth  ligand. 


Introduction 

All  heme  proteins  share  a  common  active  site  or  prosthetic  group  consisting  of  an 
iron  porphyrin  (heme)  unit  complex  which  is  a  nearly  planar  entity  embedded  in  the 
globular  protein.  The  family  of  heme  proteins  can  be  divided  into  three  classes  ac¬ 
cording  to  their  primary  biological  function:  (1)  oxygen  transport  proteins  such  as 
myoglobin  and  hemoglobin,  (2)  electron  transfer  agents  such  as  the  cytochromes,  and 
(3)  oxidative  metabolizing  enzymes  such  as  the  peroxidases,  catalases,  and  cytochrome- 
P450s.  In  all  classes,  the  biological  function  is  centered  on  the  heme  unit  and  primar¬ 
ily  on  the  iron  itself  [1-5].  Thus,  the  oxidation  and  spin  state  of  the  iron,  the  nature 
of  the  axial  ligands,  and  the  protein  environment  of  the  heme  unit  serve  as  subtle 
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modulators  of  biological  behavior.  The  heme  group  is  also  the  principal  origin  of 
spectroscopic  features  of  these  proteins.  Both  electronic  spectra  (6-91  and  ground- 
state  electromagnetic  properties,  such  as  quadrupole  splitting  in  Mossbauer  resonance 
spectra  [  10-14),  anisotropic  g  values  and  anisotropic  hyperfine  splitting  in  electron  and 
nuclear  spin  resonance  spectra  [  15-23],  and  temperature-dependent  magnetic  moments 
[24-29],  originate  almost  entirely  on  the  heme  unit.  Thus,  these  spectroscopic  prop¬ 
erties  can  serve  as  sensitive  probes  of  the  stereoelectronic  features  of  the  heme  unit 
central  to  biological  function.  These  techniques  have  been  very  useful  in  probing  such 
important  properties  as  the  oxidation  state,  the  low  lying  spin  states,  and  the  nature 
of  the  axial  ligands  bound  directly  to  the  iron  in  the  heme  unit  of  different  families  of 
heme  proteins.  In  addition,  since  the  pioneering  x-ray  crystal  structure  determination 
of  myoglobin  [30],  rapid  advances  in  the  field  of  protein  crystallography  have  led 
to  structural  elucidation  of  a  large  number  of  heme  proteins  [30—33 ] .  Paradoxically, 
while  a  number  of  long-standing  problems  are  being  resolved  by  these  combined 
techniques,  new  ones  are  emerging.  In  the  work  reported  here,  the  techniques  of  the¬ 
oretical  chemistry  are  used  to  help  resolve  apparently  conflicting  observations  about 
the  nature  of  the  axial  ligand  bound  to  the  iron  which  have  been  deduced  from  x-ray 
crystal  structure  and  electron  spin  resonance  (esr)  spectra  studies  of  heme  proteins. 

Metmyoglobin  (MetMb),  the  oxidized  form  of  myoglobin,  has  long  been  known  to 
exist  in  a  high  spin,  ferric  state  in  which  the  iron  is  axially  bound  to  an  imidazole 
group  of  a  nearby  histidine  residue  and  to  water  [27, 31  ].  However,  the  nature  of  the 
axial  ligands  in  the  various  classes  of  metabolizing  heme  proteins  has,  at  least  until 
very  recently,  been  a  much  more  unresolved  question  [1-5).  For  the  cytochrome 
P450s,  a  great  deal  of  indirect  experimental  evidence  has  accumulated  over  the  years 
which  established  a  sulfur-containing  moiety,  most  likely  cysteine,  as  one  of  the  axial 
ligands  [5],  This  deduction  has  recently  been  confirmed  by  the  high  resolution  x-ray 
crystal  structure  determination  of  a  soluble  P450,  camphor-bound  P450-camphor 
(P450cam)  [34],  As  also  suspected,  in  this  substrate-bound,  high  spin  form,  there 
is  no  second  axial  ligand.  The  question  still  remained  then  of  the  nature  of  this  sixth 
ligand  thought  to  be  responsible  for  the  low  spin  ground  state  of  the  substrate-free, 
resting  state  of  this  enzyme.  As  recently  reviewed  [5],  detailed  spectroscopic  mea¬ 
surements,  including  mcd  and  optical  spectra,  appeared  to  indicate  that  in  the  low 
spin  ferric  resting  state  either  a  nitrogen-containing  ligand  such  as  an  imidazole,  or 
an  oxygen-containing  ligand  such  as  a  serine  or  threonine  was  bound  to  the  iron.  Sur¬ 
prisingly,  the  x-ray  structure  of  camphor-bound  cytochrome  P450  revealed  no  such 
amino  acid  close  enough  to  be  capable  of  binding  as  an  axial  ligand  to  the  Fe  [34],  A 
more  recent  x-ray  structure  determination  [35)  of  the  substrate-free  P450cam  supports 
this  original  assessment.  Instead,  in  the  absence  of  camphor,  water  appears  to  bind  as 
the  second  axial  ligand  to  the  Fe  with  an  Fe-0  distance  of  2.28  A.  There  is  also  some 
evidence  that  the  water  ligand  is  part  of  a  self-contained  H-bonded  network  with 
4  more  water  molecules  in  the  substrate-binding  site,  which  do  not  interact  with  the 
lipophilic  residues  that  comprise  this  site.  Few  significant  protein  conformational 
changes  were  noted  between  the  substrate  free  and  bound  state.  Thus,  it  would  appear 
that  this  new  study  has  determined  that  the  axial  ligands  in  the  resting  state  of  cyto¬ 
chrome  P450cam  are  a  cysteine  sulfur  and  a  water  oxygen  moiety. 
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Ir  an  attempt  to  independently  verify  the  nature  of  the  second  axial  ligand,  the  esr 
spectrum  of  P450cam  in  65%  l70-enriched  H20  was  recently  determined  [36],  Mag¬ 
netic  nuclear-electron  spin  interactions,  (I  •  A  •  S),  in  paramagnetic  molecules  by 
electron  nuclear  dipolar  coupling  or  isotropic  contact  interactions  lead  to  a  hyperfine 
splitting  of  their  electron  spin  resonance  absortion  spectra.  In  a  well  developed  for¬ 
malism  reviewed  in  detail  elsewhere  [37],  it  has  been  shown  that  the  magnitude  of 
the  splitting  is  determined  by  the  isotropic  and  anisotropic  coupling  constants  which 
are  proportional  to  the  spin  density  at  the  nucleus  of  each  atomic  species.  The 
isotropic  contact  terms  which  usually  make  a  major  contribution  to  the  coupling  con¬ 
stants  and  hence  to  observed  splittings  are  non-zero  only  if  the  nuclei  on  which  they 
are  centered  have  non-zero  nuclear  spins.  Thus,  for  paramagnetic  ferric  heme  proteins, 
this  method  is  useful  as  a  probe  of  the  identity  of  axial  ligands  provided  that  the 
ligand  has  a  nuclear  spin  and  also  sufficient  spin  density  centered  on  it  to  give  a 
coupling  constant  large  enough  to  result  in  observable  splitting  or  at  least  broadening 
of  esr  resonance  lines.  To  use  this  method  as  a  probe  for  water  as  an  axial  ligand, 
since  lhO  does  not  have  a  nuclear  spin,  it  is  necessary  to  perform  the  esr  experiments 
in  water  highly  enriched  withl70.  If  broadening  or  splitting  is  obtained  in  such  esr 
experiments,  it  can  be  taken  as  strong  evidence  for  the  presence  of  an  exchangeable 
O  ligand  such  as  water.  For  P450cam.  however,  no  detectable  broadening  was 
observed  in  any  of  the  three  anisotropic  esr  lines,  not  even  in  the  sharp  derivative 
line  corresponding  to  g  =  2.  These  negative  results  are  in  apparent  conflict  with 
the  x-ray  structure  which  indicates  water  oxygen  as  the  most  probable  axial  ligand. 
By  contrast,  in  earlier  work  [38],  such  broadening  was  seen  for  MetMb  known 
from  x-ray  structure  [31]  to  have  an  H:0  molecule  as  a  sixth  ligand.  The  question 
addressed  in  this  work  is  then:  Can  this  apparent  inconsistency  in  the  esr  and  x-ray 
data  of  P450cam  be  resolved  and  the  origin  of  differences  between  P450cam  and 
MetMb  understood? 

More  generally,  the  question  we  are  addressing  is  whether  negative  results  in  such 
esr  experiments  need  always  be  interpreted  as  the  absence  of  water  as  an  axial  ligand 
or  is  there  the  possibility  that  in  the  low  spin  form  of  P450  and  in  other  ferric  heme 
systems,  there  could  be  an  axial  water  ligand  which  might  not  make  itself  known 
through  broadening  of  the  g  =  2  line  in  the  presence  of  '  O-enriched  H:0?  In  addi¬ 
tion  to  P450cam,  no  such  broadening  was  seen  [38]  for  another  type  of  metabolizing 
heme  protein,  horseradish  peroxidase  (HRP),  a  high  spin  ferric  heme  protein  with 
an  imidazole  axial  ligand  resembling  MetMb.  It  was  hence  concluded  that  H,0  is  not 
a  sixth  ligand  in  that  protein,  though  other  indirect  evidence  points  to  its  presence. 
There  is  as  yet  no  x-ray  structure  for  HRP.  However,  there  is  now  a  1 .7  A  resolution 
structure  for  a  closely  related  heme  protein,  cytochrome  c  peroxidase  (CCP)  [33], 
which  does  have  water  as  an  axial  ligand,  as  well  as  an  imidazole,  as  in  MetMb. 
Should  one  expect  to  see  comparable  broadening  of  the  esr  signal  of  this  high  spin 
ferric  heme  protein  in  the  presence  of  l70-enriched  water?  No  such  studies  have  as 
yet  been  reported. 

In  an  interesting  variation  of  the  subject  of  axial  ligands,  a  recent  x-ray  structure  of 
another  type  of  oxidative  metabolizing  heme  protein,  catalase,  has  been  reported  |32]. 
Peroxidases  and  catalases,  share  common  first  steps  in  their  metabolic  cycle  [2-4], 
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both  being  twice  oxidized  by  peroxide  to  their  active  Compound  I  form.  However, 
while  peroxidases  have  relatively  nonspecific  substrates,  catalases  have  a  high  degree 
of  substrate  specificity  and  preference  for  unsubstituted  hydrogen  peroxide.  This  high 
spin  ferric  heme  protein  was  found  to  have  a  single  axial  ligand,  a  tyrosine  residue 
(Tyr357),  which,  at  physiological  pH,  presumably  binds  primarily  as  a  phenolate. 
If  some  method  could  be  found  to  use  l70-enriched  tyrosine,  would  there  be  a 
detectable  broadening  of  the  esr  signal  of  this  protein? 

In  this  study  then  we  have  used  the  x-ray  structure  of  the  active  site  of  these  four 
classes  of  heme  proteins,  cytochrome  P450cam  (34]  and  CCP  [33]  kindly  given  to 
us  by  Dr.  Thomas  Poulos,  of  catalase  [32],  generously  provided  by  Dr.  Michael 
Rossmann,  and  of  MetMb  [31]  obtained  from  the  Brookhaven  Data  Bank  [39]  to  cal¬ 
culate  the  electron  and  spin  distribution  in  low-lying  sextet,  quartet,  and  doublet 
states  and  compare  the  extent  of  spin  delocalization  on  the  oxygen  ligands  with  that 
calculated  for  MetMb  for  which  170  broadening  was  detectable.  The  results  have 
allowed  us  to  predict  the  systems  for  which  hyperfine  broadening  of  the  esr  spectra 
would  be  observable,  and  to  what  extent  this  property  is  a  good  measure  of  the  pres¬ 
ence  of  oxygen-containing  axial  ligands. 

Methods  and  Procedures 

All  studies  of  the  iron-porphyrin  complexes  have  been  made  using  an  indo/scf/ci 
program  [40]  described  in  detail  elsewhere  [41-43].  It  was  developed  primarily  in  the 
laboratory  of  Dr.  Michael  Zeraer  with  the  collaboration  of  this  laboratory.  This  pro¬ 
gram  includes  parameterization  for  transition  metals  and  has  configuration  interaction 
capabilities  allowing  calculation  of  electronic  spectra.  Recently,  an  open  shell  rhf 
formalism  has  been  implemented  which  allows  calculation  of  low-lying  states  of  dif¬ 
ferent  multiplicity  without  spin  contamination.  This  indo/rhf  procedure  was  used  to 
calculate  the  electron  and  spin  distribution  and  relative  energies  of  the  lowest  lying 
sextet,  quartet  and  doublet  state  for  each  model  heme  system,  corresponding  to  5,  3, 
and  1  unpaired  electron  states,  respectively.  A  Mulliken  population  analysis  [44]  was 
then  used  with  indo  deorthogonalized  orbitals  to  calculate  the  electron  density  on 
each  atomic  center.  Within  the  rhf  formalism,  the  electron  distribution  on  each 
atomic  center  summed  over  the  open  shell  molecular  orbitals  yield,  by  definition,  the 
spin  density  distribution  on  each  atomic  center. 

The  model  heme  active  sites  used  in  this  calculation  for  MetMb,  CCP,  P450cam. 
and  catalase  are  shown  in  Table  I.  They  each  consist  of  a  porphine  ring,  the  iron,  and 
its  axial  ligands.  For  each  protein,  this  active-site  geometry  was  taken  from  the  corre¬ 
sponding  x-ray  structure  as  indicated.  These  active  sites  represent  the  resting  state  of 
each  protein.  In  the  resting  state  of  P450cam,  while  an  oxygen  atom  has  clearly  been 
identified  as  an  axial  ligand,  the  exact  nature  of  this  ligand  is  not  totally  clear.  It  is 
possible  that  if  it  is  a  water  molecule  involved  in  an  H-bonded  network  with  4  others, 
as  suggested  [35],  it  could  have  some  degree  of  anionic,  OH  character.  It  also  ap¬ 
pears  that  while  the  iron  moves  toward  the  heme  plane  in  the  camphor-free  state,  it 
remains  nonpianar  toward  the  cysteine.  In  previous  calculations  [43],  we  have  shown 
that  whether  the  axial  ligand  is  water  or  hydroxide-like  and  the  extent  to  which  the  Fe 
moves  into  the  heme  plane,  can  determine  the  high  spin/low  spin  equilibrium  in  the 
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Table  I.  Geometry  of  4-model  heme  proteins. 


“  X-ray  Structure  [Ref.  31]. 

b  X-ray  Structure;  oxygen  is  slightly  off  axis  [Ref.  33],  Long  Fe  —  O  bond  length 
could  be  due  to  presence  of  polar  HIS52  and  TRP51  absent  in  MetMb. 

'  Coordinate  from  x-ray  of  P450  camphorbound  P450cam  [Ref.  34]  with  ligand  dis¬ 
tances  from  x-ray  structure  of  camphor- free  P450cam  [Ref.  35]- 
*  X-ray  structure  from  Dr.  Michael  Rossman;  oxygen  is  off  axis,  <  FeOCrf>  =  131° 
[Ref.  32], 

'  Distance  the  Fe  is  out  of  the  plane  of  the  porphyrin. 


resting  state.  We  have,  therefore,  calculated  the  electron  and  spin  distributions  for 
several  models  of  the  resting  state  of  P450cam. 

Results  and  Discussion 

For  MetMb,  catalase,  and  CCP,  the  high  spin  state  was  calculated  to  be  lower  in 
energy  than  the  doublet  state,  with  a  quartet  state  nearly  degenerate  with  the  sextet 
state,  consistent  with  experimentally  observed  temperature-dependent  magnetic 
moments  [25-29].  The  high  spin  ground  state  obtained  for  catalase  is  typical  of  a 
five-coordinated  ferric-heme  complex  with  a  single  anionic  axial  ligand.  Such  a 
ground  state  occurs,  for  example,  in  the  camphor-bound  P450cam  in  which  the  single 
axial  ligand  is  a  mercaptide.  For  camphor-free  P450cam,  with  water  and  mercaptide 
as  axial  ligands,  a  low  spin  state  is  calculated  to  predominate  if  the  Fe  moves  into  the 
heme  plane  and  the  Fe-0  distance  shortens  to  2.00  A,  or  if  there  is  some  anionic 
character  in  the  water  ligand.  Such  charge  transfer  could  be  caused  by  postulated  in¬ 
teraction  of  a  water  ligand  with  a  network  of  H -bonded  waters  or  other  H-acceptor 
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groups.  For  example,  if  we  assume  an  OH'  as  an  axial  ligand  with  an  Fe-O  bond 
length  of  1.75  A,  the  low  spin  state  is  16  kcat/mol  lower  in  energy  than  the  high 
spin  state. 

From  the  calculated  spin  distributions  shown  in  Tables  II  and  III.  we  see  that  for 
no  model  heme  protein,  in  any  spin  state,  is  there  a  large  amount  of  unpaired  spin  on 
the  oxygen  ligand.  While  all  values  are  small,  systematic  variations  observed  among 
these  ery  similar  systems  should  be  reliable.  From  the  spin  distribution  on  the  oxy¬ 
gen  ligands  shown  in  Table  II,  we  see  that  for  all  model  heme  systems,  the  spin  den¬ 
sity  decreases  with  lower  spin  states,  in  the  order  S  =  5/2  >  3/2  >  1/2.  Also, 
calculated  spin  densities  are  appreciably  larger  on  anionic  than  on  neutral  oxygen 
species. 

The  water  oxygen  of  the  high  spin  ferric  MetMb  was  calculated  to  have  0.057e  or 
about  1.1%  of  the  total  spin.  For  this  protein,  a  detectable  amount  of  broadening  of 
the  g  =  2  signal  was  observed  in  the  f,sr  spectra  in  the  presence  of  'O-enriched  H,0 
[38].  Thus,  the  calculated  value  of  oxygen  spin  densities  for  the  other  model  heme 
proteins  compared  to  MetMb  can  be  used  as  a  measure  of  the  relative  extent  of 
broadening  that  would  be  observed  for  them. 

By  far  the  largest  spin  density  found  for  an  oxygen  ligand  is  that  on  the  phenolate 
oxygen  of  the  high  spin  ferric  catalase.  This  result  is  not  surprising  since  this  aro¬ 
matic  anion  is  more  tightly  bound  to  the  iron  than  are  the  water  ligands.  Even  in  this 
system,  however,  only  5%  of  the  total  spin  density  is  on  the  oxygen  atom.  If  it  is 
possible  to  exchange  the  tyrosine  phenol  oxygen  for  ,70  and  incorporate  it  into  the 
protein,  a  significant  broadening  of  the  ESR  lines  should  be  observed  *  Even  if  the 
tyrosine  is  protonated,  broadening  should  be  detectable,  since  the  results  in  Table  II 
show  the  oxygen  spin  density  on  phenol  is  larger  than  that  calculated  for  MetMb. 

Using  the  crystal  structure  geometry,  the  low  spin  ferric  P450  system,  has  virtually 
no  spin  density  on  the  oxygen  of  the  axial  water  ligand,  with  unpaired  spin  = 
0.0005e,  100  times  less  than  that  of  MetMb.  Allowing  the  Fe — O  distance  of  the 
water  ligand  to  decrease  or  simulating  its  ionic  character  by  OH ' ,  both  lead  to  low 
spin  ground  states  and  somewhat  increase  the  spin  density  on  the  oxygen  but  it  re¬ 
mains  at  best  about  1/6  that  of  MetMb.  Since  the  broadening  in  MetMb  was  barely 
detectable,  no  measurable  effect  on  esr  spectra  would  be  observed  for  any  low  spin 
model  of  P450cam  currently  proposed.  These  results  then  account  for  the  absence  of 
a  detectable  broadening  of  the  P450  esr  signal  in  the  presence  of  high  concentration 
of  protein  and  '  OH,  65%  in  a  manner  consistent  with  the  presence  of  water  as  an  ax¬ 
ial  ligand  in  the  resting  state  of  P450cain. 

Turning  now  to  the  fourth  model  heme  protein  considered.  CCP.  the  greatly  dimin¬ 
ished  spin  densities  on  the  water  oxygens  in  the  high  spin  state  compared  to  MetMb, 
allow  us  to  predict  that  no  detectable  broadening  will  be  seen  due  to  l70-enriched 
H,0  in  the  esr  spectra  of  CCP.  Since  CCP  is  similar  to  HRP.  these  results  also  imply 


*A  plausible  procedure  for  preparation  of  '  O-enriehed  tyrosine  is  to  prepare  it  by  use  of  phenylalanine 
hydroxylase  in  the  presence  of  gaseous  O.  with  phenylalanine  as  a  substrate  The  l7O  enriched  tyrosine 
could  then  be  incorporated  into  bacterial  catalase  using  a  tyrosine-deficient  mutant  form  <H  Bcincrt:  pri¬ 
vate  communication). 
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Table  II.  Spin  Density*  on  oxygen  ligand  in  different  spin  states  of  four  model 

heme  proteins 


[MetMb]” 

[CCP]” 

[P450]° 

[Catalase]0 

L 

Imidazole 

Imidazole* 

SCH3" 

_ _ 

Lo 

H,0 

H,0 

H,0 

Phenolate 

S  =  5/2 

0.057 

0.017 

0.01(0.03/ 

0.246 

3/2 

0.030 

0.016 

—  <0.03/ 

0.208 

1/2 

0.004 

0.0004 

0.0005(0.004/ 

0.052 

Lo 

OH 

OH 

Phenol 

Fe  — O 

2.40 

1.75 

1.76 

S  =  5/2 

0.08(07/* 

0.16 

0.088 

3/2 

0.07(06/* 

0.01 

0.077 

1/2 

0.0004(.0003)d 

0.01 

0.03 

*  Spin  densities  given  in  units  of  e-  and  distances 

in  A. 

"  With  imidazole  anion  as 

L  no  significant  change  in  spin  on  oxygen. 

c  Calculated  for  Fe  —  OH2 

distance  of  2.00  and  Fe  in  plane. 

11  Values  obtained  with  L= 

=  imidazole  anion.  Lo= 

=  OH  . 

Table  HI. 

Spin  density*  distribution  in  high  and  low  spin  states  of  four  model 

heme  proteins 

[MetMb]  +  1 

[CCP]  +  1 

[P450]0 

[Catalase  ]0 

Lo 

h2o 

h2o 

H,0 

Phenolate 

LI 

Imidazole 

Imidazole 

SCH3 

5/2  1/2 

5/2  1/2 

5/2  1/2 

5/2  1/2 

Fe 

4.29  0.94 

4.25  0.93 

4.20  0.93 

4.18  0.92 

LI 

(N)  (S) 

0.09  0.001 

0.08  0.003 

0.04  0.04 

-  - 

other 

0.04  0.002 

0.03  0.009 

0.26  0.00 

-  - 

Lo 

(O) 

0  057  0.004 

0.017  0.0004 

0.01  0.0005 

0.25  0.05 

other 

0.017  0.000 

0.004  0.0002 

0.01  0.0000 

0.05  0.003 

Porphyrin 

0.51  0.053 

0.62  0.06 

0.48  0.04 

0.52  0.028 

*  Spin  densities  given  in  units  of  e-. 


that  water  can  be  a  sixth  ligand  in  that  system  despite  the  absence  of  a  detectable  '  O 
broadening.  One  of  the  reasons  for  the  diminished  spin  density  on  the  water  oxygen 
on  CCP,  compared  to  MetMb,  could  be  the  much  longer  Fe — O  bond  length.  2.4  A 
in  CCP  compared  to  1 . 9  A  in  MetMb.  This  bond  lengthening  has  been  attributed  [  1 1 
to  the  presence  of  polar  residues  around  the  water-binding  site  which  can  H-bond  to 
it  in  CCP;  residues  which  are  absent  in  MetMb.  To  simulate  this  effect,  the  CCP 
system  was  also  calculated  with  an  OH”  as  a  sixth  ligand.  In  this  system,  the  spin 
density  on  the  oxygen  increased  to  a  value  larger  than  that  for  MetMb.  If  there  is 
such  appreciable  charge  donation  to  the  water  in  CCP  or  if  the  experiments  are  done 
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at  high  pH,  it  is  possible  that  broadening  of  the  CCP  and,  by  analogy,  the  HRP  spec¬ 
tra  would  be  observed,  another  possible  indication  of  a  water  ligand  in  the  latter  sys¬ 
tem.  However,  under  these  conditions  the  peroxidases  are  likely  to  become  low  spin 
and  the  spin  density  on  the  oxygen  would  again  be  too  low  to  detect  (Table  II). 

There  are  also  polar  groups  around  the  histidine  residue  in  CCP  which  are  absent 
in  MetMb.  As  shown  in  Table  II,  simulating  their  effect  by  an  anionic  imidazolium 
ligand,  does  not  appreciably  change  the  spin  distribution  on  the  oxygen. 

In  summary,  we  have  calculated  and  compared  the  spin  density  localized  on  the 
axial  oxygen  ligand  in  the  active  site  of  four  model  heme  proteins,  MetMb,  CCP, 
P450cam,  and  catalase,  using  the  x-ray  structure  geometries.  Our  results  reveal  no 
spin  density  on  the  water  oxygen  atom  of  the  low  spin  state  of  P450cam.  This  re¬ 
solves,  in  a  consistent  manner,  the  lack  of  observation  of  l70  broadening  in  the  esr 
spectra  of  this  system  with  the  crystal  structure  determination  of  the  presence  of  a 
water  ligand  in  the  resting  state.  Using  the  MetMb  system  to  determine  a  threshold 
value  of  spin  density  on  the  oxygen  to  cause  detectable  broadening  of  the  g  =  2  line 
in  the  esr  spectra;  we  predict  that  the  catalase  system  would,  in  principle,  exhibit  l70 
broadening  of  its  esr  spectra  but  that  CCP  would  not.  Given  the  similarity  of  HRP  to 
CCP,  we  also  conclude  that  the  absence  of  broadening  of  the  esr  spectra  of  that  en¬ 
zyme  does  not  rule  out  the  presence  of  water  as  a  sixth  ligand,  as  it  does  in  CCP.  Fur¬ 
ther  experiments  should  test  these  inferences  made  from  small  but  systematic 
variations  found  in  the  spin  distribution  of  these  heme  systems. 
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Abstract 

The  equilibrium  geometry,  stabilization  energy,  and  electric  polarizability  of  formic  acid,  formamide. 
and  the  three  possible  cyclic  hydrogen-bonded  pairs  are  obtained  by  ab  initio  calculations  using  the  STO- 
3G.  4-31G,  and  6-31G"  bases.  These  three  properties  are  found  to  be  very  much  dependent  on  the  basis 
set  extension.  The  polarizability  of  the  dimers  is  found  to  be  basically  additive  in  contribution  from  the 
monomeric  moieties. 


Introduction 

Increasing  attention  is  being  given  to  studies  of  the  dielectric  and  electronic  proper¬ 
ties  of  biological  materials  and  to  the  ways  in  which  they  interact  with  electromag¬ 
netic  energy  [1,2],  It  has  also  been  realized  that  the  nonlinear  optical  properties  of 
some  biological  organic  molecules  were  measurable  [3.4],  and  potentially  useful  in 
quantum  electronic  devices  [5J.  One  characteristic  feature  of  many  biological  materi¬ 
als  is  to  adopt  a  specific  order  due  to  the  presence  of  hydrogen  bonds.  In  the  design 
of  new  synthetic  materials  with  interesting  electroactive  properties  the  control  of  the 
organization  at  the  molecular  level  is  very  important  [6] .  Thus  it  can  be  beneficial  to 
learn  from  nature  ways  to  achieve  molecular  organizations  of  specific  interest. 

Various  approaches  aimed  at  molecular  organization  exist  and  are  actively  studied: 
Langmuir-Blodgett  film  deposition,  topochemical  solid-state  polymerization,  liquid 
crystal  formation,  and  so  on.  In  the  case  of  liquid  crystalline  systems,  chain  ordering 
in  the  nematic  phase  has  recently  been  demonstrated  for  an  organic  system  containing 
a  carboxylic  group  which  dimerizes  by  hydrogen  bond  formation  [7],  Likewise,  the 
property  of  hydrogen  bonding  to  impose  specific  patterns  to  the  molecular  organiza¬ 
tion  has  been  used  elegantly  in  attempts  to  engineer  molecular  ferromagnets  [8). 
These  two  examples  suggest  potential  applications  of  the  hydrogen  bond  to  control 
the  structure  of  conjugated  organic  chains  promising  for  their  electrooptic  properties. 
For  example,  phenomenological  models  predict  an  L’  dependence  of  the  polarizabil¬ 
ity  with  rr  oect  to  the  length  L  of  the  conjugation  path  [9-12],  In  long  chains,  how¬ 
ever,  conformational  freedom  can  result  in  defects  and  twists  which  tend  to  destroy 
the  extent  of  the  conjugation  and  thus  spoil  the  expected  benefit  of  the  L'  behavior. 
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One  possibility  for  preventing  and/or  minimizing  the  occurrence  of  these  undesirable 
effects  could  be  to  prepare  “long”  conjugated  chains  with  carboxylic  and/or  amide 
end  groups  capable  of  hydrogen  bond  pairing  between  neighboring  chains  as  sche¬ 
matically  indicated  in  Figure  1 .  Small  chain  systems  of  that  type  do  exist  and  have 
been  extensively  studied  from  the  point  of  view  of  their  molecular  packing  [13, 14). 
The  synthesis  of  longer  conjugated  chains  with  end  groups  capable  of  forming  hydro¬ 
gen  bonds  is  conceivable,  but  whether  the  overall  chain  conjugation  will  be  inter¬ 
rupted,  reduced,  or  enhanced  by  the  cyclic  hydrogen-bonded  pairs  remains  an 
important  factor  to  know  beforehand.  Indications  can  be  obtained  from  the  polariz¬ 
ability  of  such  pairs  compared  to  that  of  their  monomers. 

The  components  of  the  electrical  polarizability  tensor  can  in  principle  be  estimated 
from  quantum  mechanical  calculations,  but  serious  conceptual  and  computational 
problems  related  to  its  practical  evaluation  are  still  unsolved  |15).  Moreover,  as  indi¬ 
cated  by  numerous  studies  on  simple  compounds,  the  properties  of  hydrogen-bonded 
systems  are  rather  difficult  to  obtain  compared  to  those  of  more  strongly  bonded  sys¬ 
tems.  The  presently  suggested  computational  strategies  to  approach  such  problems 
tend  also  to  differ:  to  some  authors  large  basis  sets  cannot  be  avoided,  while  others 
suggest  that  more  limited  basis  sets  can  be  used  in  a  sensible  way  provided  polariza¬ 
tion  functions  are  included  in  the  basis  describing  the  hydrogen  atoms.  Finally,  a 
question  that  has  received  comparatively  less  attention  is  the  effect  of  the  geometry 
reorganizations  taking  place  in  the  paired  monomers  and  how  to  relate  their  proper¬ 
ties  to  those  of  the  isolated  constituents  when  applying  methods  to  correct  for  the  ba¬ 
sis  set  superposition  error. 

All  these  unsolved  questions  have  relevance  to  the  present  work,  and  in  spite  of 
this  largely  unsettled  situation,  we  have  calculated  the  equilibrium  geometry  and 
electrical  polarizability  of  formic  acid,  formamide.  and  their  three  possible  cyclic  hy¬ 
drogen-bonded  pairs.  The  results  reported  hereafter  are  meant  to  serve  as  reference 
for  further  studies  on  related  but  more  realistic  systems  which,  for  practical  reasons, 
will  imply  the  use  of  basis  sets  of  limited  extent.  Our  expectations  are  that  the  quali¬ 
tative  trends  obtained  on  the  energies,  most  stable  geometries,  and  electrical  polariz¬ 
abilities  are  reliable.  The  average  polarizability  has  been  calculated  for  the 
equilibrium  geometries  corresponding  to  the  three  basis  sets  (sto-3G.  4-3 1G.  and 
6-3 1G  *)  considered.  In  all  cases  a  correction  to  the  basis  set  superposition  error 
(bsse)  has  been  applied  using  the  original  form  of  the  counterpoise  (cp)  method. 


H,  Y  «= 


}-» . v-d  3- 


]  «  conjugated  pathway 


vH  — Vh 


Figure  I .  Schematic  representation  of  conjugated  skeletons  connected  by  cyclic  hydrogen 
bonded-pairs  involving  carboxylic  and  amide  functional  groups. 
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Methodology 

Calculations  on  the  monomers  and  dimers,  shown  in  Figure  2,  have  been  per¬ 
formed  with  the  Gaussian  82  series  of  programs  (16),  adapted  for  an  FPS-164  proces¬ 
sor  attached  to  an  IBM  4341  computer.  A  code  for  the  finite-field  steps  has  been 
added  to  the  Gaussian  82  software.  All  the  geometric  parameters,  denoted  according 
to  the  general  convention  in  Figure  3,  for  the  molecules  shown  in  Figure  2  and  con¬ 
strained  to  be  planar,  were  optimized  for  each  of  the  basis  sets  considered,  namely, 
the  minimal  sto-3G  basis  set,  the  split  valence  4-3 IG  basis,  and  the  6-3 1G"  split-va¬ 
lence  basis  including  polarization  functions  117).  The  6-3  IG”  polan  uion  basis  set 
has  been  selected  because  it  was  designed  for  the  description  of  weakly  bonded  sys¬ 
tems  such  as  those  in  which  hydrogen  is  a  bridging  atom;  it  includes  a  single  set  of 
Gaussian  p-type  functions  for  each  hydrogen  atom.  The  Fletcher  Powell  procedure 
was  used  to  minimize  the  forces  on  the  nuclei.  The  standard  threshold  conditions  of 
the  Gaussian  82  program  have  been  kept;  10  a.u.  for  the  two-electron  cutoff.  1CT* 
for  the  requested  convergence  on  the  density  matrices,  and  5  x  (0  4  HartreeBohr'1 
as  the  maximal  residual  force  on  the  Cartesian  components. 

The  Cartesian  frame  with  respect  to  which  the  diagonal  components  a„ .  i  ~  x.  y, 
and  z  of  the  electric  polarizability  tensor  are  calculated  is  given  in  Figure  3.  Polariz¬ 
ability  calculations  have  been  performed  using  the  finite -field  (FFl  approach  (18.  19 J 
where  a  term  -p..F  describing  the  interaction  between  an  external  homogeneous  field 
F  and  the  molecular  dipole  moment  p  is  added  to  the  unperturbed  molecular  Hamil¬ 
tonian.  In  the  presence  of  the  external  electric  field,  the  molecular  dipole  moment  is 
dependent  on  F,  and  the  components  a ,,  of  the  electric  polarizability  'ensor  a  can  be 
expressed  as; 


a,,  =  [dp.,(F)/dF,]f_u.  i  =  a,  y.  and  r.  (1) 

In  practive  they  are  evaluated  utilizing  an  approximate  differentiation  operator, 

o„  =  [p,(F,)  -  p,{-F,)]/(2F[)  i  =  a.  y,  and  r.  (2) 
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Figure  2.  Schematic  representation  of  the  formic  acid,  (ormamtde.  and  the  three  com 
piexes  considered  in  the  present  work 
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Figure  3.  Notalional  convention  for  the  geometrical  parameters  of  the  molecules  and 
dimers  optimized  in  this  work. 


in  conjunction  with  the  Romberg  algorithm  [20],  The  values  of  the  electric  field  com¬ 
ponents  Ft  actually  used  in  the  numerical  procedure  to  compute  #z,  are  equal  to  ±0.001 
and  ±0.002  au  (1  au  of  electric  field  =  5.1423  x  10"  Vm'1).  As  found  in  other  calcu¬ 
lations  [21,23],  we  checked  that  these  external  electric  field  values  yield  consistent 
and  numerically  accurate  polarizabilities.  The  values  of  the  polarizability  reported  in 
this  article  are  expressed  in  au  (1  au  of  polarizability  =  1.6488  x  10  41  C2m2J~') 

Due  to  the  weakness  of  the  hydrogen  bond,  incompleteness  of  the  atomic  basis  set 
can  cause  properties  such  as  the  interaction  energy,  optimal  geometries,  dipole 
moment,  etc.  to  be  significantly  in  error  (basis  set  superposition  error  or  bsse).  The 
situation  is  expected  to  worsen  in  the  case  of  the  electric  polarizability,  for  which  it  is 
known  that  quite  diffuse  functions  are  required.  The  trouble  is  that  no  completely  re¬ 
liable  scheme  seems  to  exist  yet  for  eliminating  or  estimating  superposition  errors. 
Many  papers  [24-33]  dealing  with  this  problem  have  recently  appeared  in  the  litera¬ 
ture.  They  basically  investigate  variants  of  the  full  counterpoise  correction  originally 
proposed  by  Boys  and  Bemardi  [24|,  but  no  conclusive  indication  seems  to  emerge 
from  which  a  given  approach  should  definitely  be  preferred.  Therefore  in  this  work, 
we  have  adopted  the  conservative  choice  to  use  the  original  cp  method  [24  ] . 

Owing  to  the  fact  that  important  geometry  relaxations  take  place  upon  dimeriza¬ 
tion,  the  counterpoise  correction  can  be  applied  in  many  ways.  For  instance  the 
“ghost”  functions  could  be  centered  on  positions  characteristic  of  the  optimized  ge¬ 
ometry  of  the  isolated  monomer  or  on  those  of  the  monomer  in  the  relaxed  geometry 
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of  the  dimer.  Thus  to  study  the  influence  of  different  possibilities  on  the  calculated 
properties  (Mulliken  charge  indices,  dipole  moments,  stabilization  energy,  and  aver¬ 
age  polarizability),  several  arrangements  of  either  the  molecular  skeletons  and/or  the 
centers  of  the  “ghost”  basis  functions,  denoted  by  the  capital  letters  A  to  Q  in  Fig¬ 
ure  4,  have  been  considered.  In  the  figure  the  symbols  (m)  and  (d),  respectively,  in¬ 
dicate  the  optimized  structure  of  the  isolated  monomer  and  the  monomeric  moiety  as 
it  has  relaxed  in  the  cyclic  hydrogen-bonded  pair.  In  the  hypothetical  pairs  where  the 
monomeric  moieties  enter  with  the  geometry  of  the  isolated  dimer,  namely  cases  C, 
H,  M,  and  N,  the  H,-C2  and  Cg-H7  directions  of  the  partner  molecules  (or  of  the  set 
of  ghost  functions)  have  arbitrarily  been  aligned,  and  the  C2— -C„  distance  has  been 
chosen  to  be  that  of  the  corresponding  fully  optimized  dimer. 
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Formamide  :  monomer  and  dimer 
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Figure  4.  Graphical  representation  of  the  various  study  cases  on  isolated  monomers, 
monomers  in  presence  of  the  "ghost"  functions  (shaded  areas  in  the  figure),  and  dimers,  (m) 
or  (d)  indicates  which  optimized  geometry  is  used  for  monomeric  moiety,  either  that  of  the 
isolated  monomer  (m)  or  the  monomer  (d)  in  the  dimer.  Capital  letters  label  the  different 
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Constituent!  *r  the  formic  add  and  farmamide  pair 
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Figure  4.  ( Continued .) 


Results  and  Discussion 

The  molecular  geometries  predicted  from  geometry  optimization,  net  atomic 
charges,  overlap  populations,  dipole  moments,  and  stabilization  energies  are  reported 
below  while  the  results  on  the  average  polarizability  are  discussed  in  a  later  section. 

Basis  Set  Dependence  of  the  Geometry,  Charge  Indices,  and  Dipole  Moments. 

Geometries.  Formic  acid  and  formamide  are  the  simplest  isoelectronic  molecules 
capable  of  forming  cyclic  hydrogen-bonded  pairs.  As  such  they  have  been  studied  ex¬ 
tensively  both  theoretically  and  experimentally.  Since  polarizability  is  a  property  sen¬ 
sitive  to  structure,  it  is  important  to  use  the  optimum  geometries  corresponding  to 
each  basis  set.  To  comply  with  these  requirements,  we  have  performed  complete  ge¬ 
ometry  optimizations  at  the  sto-3G,  4-3 1G,  and  6-3 IG**  levels  for  all  the  molecules 
represented  in  Figure  2  where  planarity  was  the  only  constraint  imposed.  Imposing 
planarity  is  justified  by  previous  calculations,  it  is  only  questionable  in  the  case  of  the 
formamide  monomer  for  which  the  pyramidal  form  is  calculated  at  the  sto-3G  level 
to  be  2.28  kcal  mor1  more  stable  than  the  planar  variety  [34].  However,  Sapse  et  al. 
[35],  using  the  STO-6G,  6-3 1G,  and  6-3 1G**  bases,  find  the  planar  form  to  be  the 
most  stable  structure. 

Table  I  shows  the  optimum  geometrical  parameters  defined  in  Figure  3  for  the  five 
molecules.  The  electron  charge  distribution  indices — net  atomic  charges,  overlap 
populations,  and  dipole  moments — belonging  to  the  investigated  molecules  are  dis¬ 
played  in  Table  II  where  capital  letters  A  to  Q  refer  to  the  notation  introduced  in  Fig- 


Table  I.  Evolution  of  the  sto-3G.  4-31G,  and  6-31G**  geometrical  parameters,  opti¬ 
mized  bond  lengths  (in  A),  and  angles  (in  degrees)  for  formic  acid,  formamide,  and 

their  dimers. 


sto-3G 

4-3 1G 

6-31G** 

Exper. 

Formic  acid:  monomer  and  dimer 

rl 

1.103  (1.107) 

1.072  (1.072) 

1.085  (1.084) 

1.097*  (1.082)b 

r2 

1.386(1.348) 

1.342  (1.313) 

1.321  (1.298) 

1.342  (1.323) 

r3 

0.990(1.009) 

0.956  (0.975) 

0.949  (0.962) 

0.972  (1.036) 

r5 

1.214(1.231) 

1.200(1.217) 

1.182  (1.196) 

1.204  (1.220) 

rf> 

(2.536) 

(2.708) 

(2.789) 

(2.703) 

al 

110.4  (112.1) 

110.4  (111.8) 

110.5  (111.5) 

112.0  (118.4) 

al 

104.8  (108.1) 

114.9  (116.8) 

108.9  (111.4) 

106.3  (108.5) 

a4 

126.0(122.2) 

125.0  (122.9) 

124.7  (122.5) 

123.2  (115.4) 

a9 

(178.9) 

(165.7) 

(173.9) 

Formamide:  monomer  and  dimer 

rl 

1.105  (1.106) 

1.081  (1.080) 

1.093  (1.091) 

1.098' (1.010, 1 

.090*)' 

r2 

1.403  (1.375) 

1.346  (1.328) 

1.348  (1.331) 

1.352  (1.318,1 

.326*) 

r3 

1.014(1.039) 

0.992  (1.006) 

0.994  (1.004) 

1.002  (0.890,1 

.010*) 

r4 

1.013  (1.014) 

0.989  (0.990) 

0.991  (0.991) 

1.002  (0.870,1 

.010*) 

r5 

1.218  (1.234) 

1.216  (1.230) 

1.193  (1.205) 

1.219  (1.241, 1 

.239*) 

r6 

(2.639) 

(2.896) 

(2.988) 

(2.948) 

al 

111.4  (116.6) 

113.7  (115.1) 

112.8  (114.6) 

112.7  (114.5.1 

16.0*) 

a2 

120.5  (118.6) 

119.5  (120.1) 

1)9.1  (119.4) 

118.5  (119.6,1 

19.0*) 

a3 

121.3  (120.1) 

121.9(121.0) 

121.6(120.6) 

120.0  (118.9.1 

18.0*) 

a4 

124.3(119.1) 

121.5  (119.7) 

122.3  (119.9) 

122.4  (120.5,1 

19.1*) 

a9 

(171.2) 

(168.4) 

(168.1) 

Dimer  of  formic  acid  and  formamide 

rl 

(1.106) 

(1.080) 

(1.090) 

r2 

(1.373) 

(1.327) 

(1.330) 

r3 

(1.035) 

(1.003) 

(1.001) 

r4 

(1.014) 

(0.990) 

(0.991) 

r5 

(1.235) 

(1.232) 

(1.207) 

r6 

(2.560) 

(2.741) 

(2.849) 

r7 

(1.107) 

(1.073) 

(1.085) 

r8 

(1.351) 

(1.313) 

(1.299) 

r9 

(1.012) 

(0.980) 

(0  065) 

rl  1 

(1.229) 

(1.215) 

(1.194) 

rl  2 

(2.618) 

(2.824) 

(2.908) 

al 

(113.8) 

(112.4) 

(112.1) 

a2 

(118.1) 

(120.2) 

(119.7) 

a3 

(120.2) 

(120.7) 

(120.3) 

a4 

(122.2) 

(122.5) 

(122.3) 

a5 

(114.4) 

(114.2) 

(113.5) 

ct6 

(108.4) 

(116.8) 

(111.4) 

a8 

(119.4) 

(120.3) 

(120.4) 

a9 

(166.2) 

(162.2) 

(162.5) 

alO 

(177.1) 

(170.7) 

(177.9) 

*Ref.  44.  b Ref.  42. 'Ref.  45.  dRef.  43. 

The  convention  has  been  introduced  in  Fig.  3.  The  geometrical  parameters  for  the 
dimers  are  given  between  parentheses.  The  experimental  values  for  the  formic  acid  and 
its  dimer  are  denoted  by  a  and  b  respectively  while  those  of  the  formamide  equivalents 
by  c  and  d.  The  center  of  inversion  in  formic  acid  and  formamide  dimers  yields  redun¬ 
dancies  such  as  r3  =  r9  etc.,  which  reduces  the  number  of  tabulated  parameters. 


Table  II.  ST0-3G,  4-3IG  (underlined),  and  6-3IG**  (italic)  net  atomic  charges,  over¬ 
lap  populations  and  dipole  moment  (in  debyes)  for  all  situations,  A  to  Q,  represented 

in  Fig.  4. 


Formic  acid:  monomer  and  dimer 


A 

B 

C 

D 

E 

Charge 

2 

+  .255 

+  .254 

+  .266 

+  .266 

+  .281 

+  .620 

+  .624 

+  .615 

+  .619 

+  .666 

+  .591 

+  .589 

+  .587 

+  .587 

+  .6/8 

3 

-.285 

-.281 

-.281 

-.277 

-.328 

-.720 

-.717 

-.730 

-.727 

-.747 

-.567 

-.555 

-.580 

-.568 

-.582 

4 

+  .207 

+  .212 

+  .211 

+  .217 

+  .273 

+  .439 

+  .450 

+  .466 

+  .477 

+  .513 

+  .362 

+  .371 

+  .386 

+  .394 

+  .418 

6 

-.251 

-.257 

-.229 

-.234 

-.305 

-.554 

-.571 

-.534 

-.551 

-.655 

-.521 

-.538 

-.502 

-.518 

-.595 

Population 

2-3 

0.28 

0.29 

0.28 

0.29 

0.30 

0.14 

0.12 

0.14 

0.12 

0.19 

0.27 

0.27 

0.27 

0.27 

0.32 

3-4 

0.25 

0.26 

0.26 

0.26 

0.24 

0.25 

0.25 

0.25 

0.24 

0.23 

0.30 

0.30 

0.30 

0.30 

0.29 

2-6 

0.44 

0.43 

0.44 

0.43 

0.42 

0.47 

0.46 

0.45 

0.45 

0.40 

4-12 

0.60 

0.60 

0.59 

0.58 

0.54 

— 

— 

(0.04) 

(0.04) 

0.04 

— 

— 

(0.02) 

(0.02) 

0.05 

Electric  dipole 

— 

— 

(0.02) 

(0.02) 

0.04 

0.625 

0,913 

0.887 

1.119 

0.0 

1.594 

1.897 

1.655 

1.953 

O0 

1.627  1.920 

Exp.  value:  1.41* 

Formamide:  monomer  and  dimer 

1.678 

1.965 

0.0 

F 

G 

H 

I 

J 

Charge 

2 

+  .254 

+  .242 

+  .262 

+  .252 

+  .262 

+  .587 

+  .583 

+  .585 

+  .581 

+  .602 

+  .561 

+  .558 

+  .557 

+  .554 

+  .568 

3 

-.438 

-.429 

-.436 

-.426 

-.462 

-.904 

-.899 

-.792 

-.918 

-.934 

-.730 

-.721 

-.743 

-.734 

-.744 

4 

+  .200 

+  .205 

+  .203 

+  .209 

+  .265 

+  .386 

+  .393 

+  .418 

+  .426 

+  .474 

+  .322 

+  .327 

+  .343 

+  .349 

+  .385 

6 

-.266 

-.274 

-.253 

-.255 

-.312 

-.611 

-.623 

-.592 

-.602 

-.682 

-.562 

-.576 

-.543 

-.556 

-.617 
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Table  II.  ( Continued .) 


Formamide:  monomer  and  dimer 


F 

G 

H 

I 

J 

Population 

2-3 

0.37 

0.37 

0.37 

0.38 

0,39 

0.11 

0.10 

0.12 

0.10 

0.17 

0.27 

0.26 

0.27 

0.26 

0.3/ 

3-4 

0.35 

0.35 

0.36 

0.35 

0.32 

0.31 

0.31 

0.30 

0.30 

0.26 

0.35 

0.34 

0.34 

0.34 

0.32 

2-6 

0.44 

0.43 

0.44 

0.43 

0.42 

0.51 

0.50 

0.49 

0.49 

0.46 

0.61 

0.61 

0.60 

0.60 

0.56 

4-12 

— 

— 

(0.02) 

(0.02) 

0.04 

— 

— 

(0.03) 

(0.03) 

0.05 

— 

— 

(0.02) 

(0.02) 

0.03 

Electric  Dipole 

2.644 

2.797 

3.069 

2.873 

0.0 

4.470 

4.645 

4.742 

4.564 

CM) 

4.096 

4.169 

4.261 

4.337 

0.0 

Exp.  value: 

3.71b 

Constituents  of  the  formic  acid  and 

formamide 

pair 

K 

L 

M 

N 

O 

P 

O 

Charge 

2 

+  .245 

— 

— 

+  .263 

— 

+0.256 

+  .270 

+  .587 

— 

— 

+  .585 

— 

+  .585 

+  .618 

+  .560 

— 

— 

.557 

— 

+  .557 

+  .576 

3 

-.430 

— 

— 

-.435 

— 

-  .427 

-t6I 

-.8% 

— 

— 

-.922 

— 

-  .914 

-.929 

-.7/8 

— 

— 

-.743 

— 

-  .730 

-.74/ 

4 

+  .205 

— 

— 

+  .204 

— 

+  .209 

+  .265 

+  .391 

— 

— 

+  .419 

— 

+  .424 

+  .475 

+  .326 

— - 

— 

+  .344 

— 

+  .347 

+  .387 

6 

-.274 

— 

— 

-.249 

— 

-  .252 

-.310 

-.626 

— 

— 

-.593 

— 

-  .604 

-.703 

-.578 

— 

— 

-.543 

— 

-  .558 

-.630 

10 

— 

+  .213 

+  .210 

— 

+  .217 

— 

+  .272 

— 

+  .452 

+  .464 

— 

+  .479 

— 

+  .513 

— 

+  .373 

+  .38/ 

— 

+  .606 

— 

.4/4 

9 

— 

-.280 

-.282 

— 

-.276 

— 

-.331 

— 

-.719 

-.731 

— 

-.729 

— 

-.755 

— 

-.554 

-.578 

— 

-.569 

— 

-.584 

8 

— 

+  .252 

+  .264 

— 

+  .263 

— 

+  .272 

— 

+  .622 

+  .615 

— 

+  .617 

— 

+  .653 

— 

+  .588 

+  .587 

— 

+  .584 

— 

+  .609 

12 

— 

-.257 

-.235 

— 

-.238 

— 

-.305 

— 

-.569 

-.534 

— 

-.549 

— 

-.637 

— 

-.539 

-.502 

— 

-.5/7 

_ - 

-.582 
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Table  II.  ( Continued .) 


Constituents  of  the  formic  acid  and  formamide  pair 


Population 

K 

L 

M 

N 

O 

P 

Q 

2-3 

0.37 

— 

— 

0.37 

— 

0.38 

0.39 

0.10 

— 

— 

0.12 

— 

0.10 

0.19 

0.26 

— 

— 

0.27 

— 

0.26 

0.32 

3-4 

0.35 

— 

— 

0.35 

— 

0.35 

0.32 

0.31 

— 

— 

0.30 

— 

0.30 

0.27 

0.34 

— 

— 

0.34 

— 

0.34 

0.32 

2-6 

0.43 

— 

— 

0.44 

— 

0.43 

0.42 

0.50 

— 

— 

0.49 

— 

0.49 

0.44 

0.61 

— 

— 

0.60 

— 

0.59 

0.55 

4-12 

— 

— 

— 

(0.03) 

— 

(0.03) 

0.04 

— 

— 

— 

(0.02) 

— 

(0.02 

0.04 

— 

— 

— 

(0.02) 

— 

(0.02) 

0.03 

6-10 

— 

— 

(0.03) 

— 

(0.03) 

— 

0.04 

— 

— 

(0.02 

— 

(0.03) 

— 

0.06 

— 

— 

<0.02) 

— 

(0.03) 

— 

0.04 

10-9 

— 

0.26 

0.26 

— 

0.26 

— 

0.24 

— 

0.25 

0.24 

— 

0.24 

— 

0.22 

— 

0.30 

0.30 

— 

0.29 

— 

0.28 

9-8 

— 

0.29 

0.28 

— 

0.29 

— 

0.30 

— 

0.12 

0.14 

— 

0.12 

— 

0.18 

— 

0.27 

0.27 

— 

0.27 

— 

0.31 

8-12 

— 

0.43 

0.44 

— 

0.43 

— 

0.42 

— 

0.46 

0.45 

— 

0.45 

— 

0.41 

— 

0.59 

0.59 

— 

0.59 

— 

0.56 

Electric  dipole 

0.917 

2.799 

0.852 

2.911 

1.088 

3.099 

2.293 

1.885 

4.665 

1.658 

4.564 

1.943 

4.765 

3.1% 

1.901 

4.282 

1.681 

4.170 

1.948 

4.360 

2.687 

'Ref.  55. 
"Ref.  45. 


ure  4.  Table  III  lists  the  total  energies  ET  corresponding  to  the  situations  A  to  Q.  and 
the  stabilization  (complexation)  energies  A E  defined  as  the  total  energy  of  the  dimer 
minus  the  energy  of  the  monomers  with  or  without  corrections  for  the  basis  set  super¬ 
position  error. 

Our  results  on  optimum  geometries  are  in  most  cases  in  very  good  agreement  with 
previously  published  STO-3G  [36, 37]  and  4-3 IG  [38-40]  results  on  formic  acid,  form- 
amide,  and  their  respective  dimers.  We  could  not  find  corresponding  results  for  the 
6-3 1G**  basis,  but  our  results  are  reasonably  consistent  with  those  obtained  by 
Mijoule  et  al.  [41]  on  the  formic  acid  dimer  calculated  with  the  6-3 IG  basis  in  spite 
of  the  linearity  of  the  O  —  H  . .  .  .  O  angle  (a,  in  our  notation)  assumed  by  these 
authors,  and  by  Sapse  et  al.  on  formamide  obtained  with  the  6-3 IG  and  6-3 IG*  basis 
sets  for  the  monomer  and  dimer.  To  the  best  of  our  knowledge,  geometry  optimiza- 


Table  III.  sto-3G,  4-3 IG,  and  6-3 1G**  total  energies  ET  (au)  and  stabilization  energies  (kcal  mol1)  for  the  situations  represented  in  Fig.  4. 
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tions  on  the  heterodimer  of  formic  acid  and  formamide  have  not  been  reported  with 
similar  bases. 

As  already  pointed  out  in  the  previous  theoretical  works,  the  C—O  distance  in 
formic  acid  and  formamide  moieties  increases  upon  dimerization  irrespective  of  the 
basis  set  extension,  though  at  a  different  rate.  At  the  6-31G**  level,  the  C=0  elon¬ 
gation  in  the  three  dimers  is  equal  to  0.012  A  and  0.014  A  in  the  formic  acid  and  for¬ 
mamide  moieties,  respectively.  As  a  counterpart  to  this  elongation,  there  is  a 
shortening  in  the  C — O  and  C  —  N  distances,  respectively,  of  0.023  A  and  0.017  A 
when  calculated  at  the  6-3 1G**  level.  Both  the  O — H  and  N  —  H  distances  increase, 
respectively,  by  about  0.013  A  and  0.017A  in  the  three  dimers,  as  can  be  noticed 
from  the  6-3 1G**  data  in  Table  I.  Except  for  the  sro-3G  results  on  the  formic  acid 
dimer,  the  hydrogen  bonds  are  slightly  bent  as  noticed  from  the  O — H  . . . .  O  and 
N — H  . . . .  O  angles,  and  a,0,  listed  in  the  table. 

The  dimer  separations  turn  out  to  be  very  sensitive  to  the  basis  set  extension.  For 
instance,  the  O — H .  .  . .  O  distance,  r6,  in  the  formic  acid  dimer  increases  substan¬ 
tially  from  sto-3G  to  4-31G  (0. 172  A),  but  also  from  4-31G  to  6-31G**  (0.081  A). 
A  similar  trend  is  observed  in  the  case  of  formamide  where  the  corresponding 
changes  are  0.257A  and  0.092  A.  In  their  paper  on  the  formic  acid  dimer,  Hayashi  et 
al.  [40]  concluded  at  an  excellent  agreement  between  the  4-3 1G  predictions  (2.7G5A) 
and  the  experimental  values  obtained  from  electron  diffraction  measurements 
(2.703  A)  [42 ] ,  but  this  kind  of  agreement  is  not  reproduced  in  the  case  of  the 
formamide  dimer:  2.878  A  theoretically  and  2.948  A  experimentally  [43].  At  the 
6-31G**  level,  the  O  —  H  . . . .  O  and  N  —  H  . . . .  O  distances  are  equal  to  2.789  A 
and  2.988  A,  respectively,  in  the  dimers  of  formic  acid  and  formamide.  In  the  het¬ 
erodimers  the  corresponding  O  —  H  . . . .  O  and  N  —  H  . . . .  O  distances  are  equal  to 
2.849  A  and  2.908  A.  The  results  by  Sapse  et  al.  on  the  formamide  dimer  show  that 
the  6-3 1G  and  6-3 1G*  geometrical  parameters  of  the  individual  moieties  in  the  dimer 
are  comparable  with  the  6-3 1G**  results  obtained  in  this  work.  However,  the  separa¬ 
tion  between  the  monomers  (measured  in  Ref.  35  by  the  diagonal  distance  C,N,  be¬ 
tween  the  carbon  in  one  moiety  and  the  nitrogen  in  the  other)  is  quite  different  when 
calculated  at  the  6-31G  (3.520  A),  6-31G*  (3.786A)  and  6-31G**  (3.878  A)  levels. 
From  these  results  it  turns  out  that  large  basis  sets  including  polarization  functions, 
6-3 IG*  and  6-31G**,  overestimate  the  distances  between  the  monomers  in  the  com¬ 
plexes  when  compared  to  experiment.  Significant  deviations  occur  also  in  the  C=0. 
C — O  and  C — N  bonds  which  are  directly  involved  in  the  hydrogen  bond  forma¬ 
tion.  Of  course  one  could  incriminate  the  basis  set,  but  this  contradicts  the  overall 
agreement  of  the  6-3 1G**  results  on  the  formic  acid  and  formamide  monomers  with 
the  most  recent  microwave  measurements  by  Davis  et  al.  [44]  and  Hirota  et  al.  [45]. 
Another  explanation  for  these  discrepancies  might  well  be  that  the  interactions  exist¬ 
ing  in  the  bulk  preclude  a  realistic  comparison  with  theoretical  calculations  on  the 
isolated  dimers. 

In  the  Cambridge  data  base,  the  only  reported  single-crystal  x-ray  diffraction  mea¬ 
surements  on  the  cyclic  dimer  of  formic  acid  are  those  by  Karle  and  Brockway  [46] 
in  1944.  More  recent  electron  diffraction  measurements  have  been  published  by 
Almeningen  et  al.  [42],  both  are  reproduced  in  Table  I.  For  the  formamide  dimer,  a 
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recent  x-ray  determination  has  been  made  by  Stevens  [43].  In  this  work,  full  x-ray 
data  and  geometrical  parameters  corrected  for  the  bias  introduced  by  aspherical  bond¬ 
ing  density  are  given.  Both  are  reproduced  in  Table  I,  the  corrected  data  being  de¬ 
noted  by  a  superscript  asterisk.  Finally,  Nahringbauer  and  Larsson  [47]  have 
published  x-ray  measurements  on  the  crystal  structure  of  the  1:1  addition  compound 
of  formic  acid  with  formamide,  HCOOH . .  HCONH2.  The  structure  contains  puck¬ 
ered  layers  of  two  types.  The  layer  relevant  to  our  study  contains  dimers  of  form- 
amide  molecules  coupled  by  hydrogen  bonds  with  a  N — H  . . . .  O  distance  equal  to 
2.990  A  and  formic  acid  molecules,  which  cross-link  the  dimers  by  hydrogen  bonds 
(2.602  and  2.946  A).  These  measurements  suggest  an  explanation  for  the  systematic 
difference  observed  between  6-3 1G**  results  and  experiment.  In  the  crystal,  the 
molecules  are  linked  by  more  than  one  type  of  hydrogen  bond  as  shown  in  Figure  5. 
For  instance,  in  the  case  of  formamide,  a  N — H . . . .  O  hydrogen  bond  with  a  dis¬ 
tance  of  2.948  A  connects  the  molecules  and  is  responsible  for  the  dimer  formation, 
but  another  hydrogen  bond,  N  —  H  ....  O'  with  a  distance  of  2.885  A,  joins  alter¬ 
nate  dimers  into  chains  [48].  These  distances  are  nearly  equal  and  the  interactions 
should  thus  be  of  comparable  strength,  which  might  explain  why  the  structural 
parameters  for  the  dimers  in  the  crystal  are  so  different  from  those  in  the  gas  phase. 
Hinton  and  Harpool  [49]  have  made  investigations  of  (formamide),  systems  to  define 
models  for  the  liquid  state  and  dilute  aqueous  solution.  Due  to  the  size  of  the  sys¬ 
tems,  only  hydrogen  bond  lengths  were  optimized  to  obtain  minimum  energy  posi¬ 
tions.  Unfortunately,  the  authors  have  not  indicated  the  optimized  distances  in  their 
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Figure  5.  Arrangement  of  the  formamide  dimers  in  the  crystal  and  indication  of  their  inter¬ 
linking  through  hydrogen  bonds. 
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work,  and  this  can  only  be  a  first  step  since  all  parameters  should  be  optimized  to  al¬ 
low  meaningful  comparison  with  experiment. 

Work  is  now  in  progress  to  simulate  the  immediate  environment  of  the  dimers  and 
conduct  geometry  optimizations  to  assess  the  incidence  of  the  other  hydrogen  bonds 
both  on  the  geometrical  parameters  of  the  dimer  moieties  and  on  the  monomer  sepa¬ 
ration  in  the  cyclic  pairs.  Drastic  influences  on  the  barrier  height  for  single  or  double 
proton  transfer  might  result. 

Net  Atomic  Charges,  Overlap  Populations,  and  Dipole  Moments.  Compared  to 
4-3 1G  and  6-3 1G**  results,  the  charge  separation  in  the  sto-3G  basis  is  less  pro¬ 
nounced,  however  the  6-31G**  values  are  somewhat  reduced  compared  to  4-31G, 
(see  Table  II).  This  corroborates  the  observation  by  Sapse  et  al.  [35]  who  noticed  that 
the  6-3 1G*  charges  are  smaller  than  those  of  6-3 1G.  The  trend  continues  with  the 
6-31G**  values  which  are  smaller  than  6-31G*.  Overlap  populations  reflect  the 
geometry  changes  occurring  in  the  different  molecular  arrangements.  Except  in  the 
case  of  the  minimal  sto-3G  basis,  the  net  atomic  charges  are  only  slightly  affected  by 
the  addition  of  the  ghost  functions  and  by  their  locations.  It  is  when  the  dimerization 
occurs  that  the  charge  increase  becomes  more  important.  The  carbon  and  hydrogen 
atoms  involved  in  the  dimerization  become  positively  charged  while  the  charges  of 
oxygen  and  nitrogen  atoms  become  more  negative. 

Dipole  moments  are  more  sensitive  to  the  presence  of  extra  basis  functions  and  lo¬ 
cation  of  their  centers.  As  usual,  the  minimal  basis  set  underestimates  the  dipole 
moments.  With  the  4-3 1G  and  6-3 1G**  bases  the  nonvanishing  dipoles  are  more  than 
doubled  compared  to  STO-3G.  Notice,  however,  that  while  the  dipole  moment  is 
larger  in  6-3 1G**  than  in  4-3 1G  in  the  case  of  the  formic  acid  monomer,  the  reverse 
is  true  for  formamide. 

Stabilization  Energies.  Table  III  is  very  illustrative  of  the  dependence  of  the 
counterpoise  correction  for  basis  set  superposition  error  not  only  with  respect  to  the 
quality  of  the  basis  set,  but  also  with  respect  to  the  geometry  of  the  monomers  and 
the  location  of  the  centers  of  the  ghost  functions.  In  the  case  of  the  sto-3G  basis  and 
for  the  three  dimers,  large  variations  in  the  stabilization  energies  are  found  depend¬ 
ing  on  the  way  A E  is  defined.  Unrealistic  stabilization  energies  are  obtained  when 
the  counterpoise  correction  is  applied  to  the  minimal  sto-3G  basis,  these  differences 
tend  to  decrease  for  larger  basis  sets.  As  observed  previously  in  the  comparison  of 
6-3 1G  and  6-3 1G*  stabilization  energies  of  the  formamide  complex,  the  sto-3G 
values  for  A£  without  counterpoise  corrections  are  remarkably  closer  to  6-3 1G** 
than  4-3 1G.  This  was  interpreted  as  a  fortunate  cancellation  of  errors  for  the  minimal 
basis  set  [35, 50].  On  the  one  hand,  the  small  separation  of  charge  observed  for  the 
sto-3G  basis  decreases  the  electrostatic  contribution  to  the  binding  energy  and 
thereby  accounts  partly  for  the  extra  stabilization  the  correction  for  superposition 
error  would  have  provided.  This  is  nicely  illustrated  by  the  artificial  values  of  A E 
obtained  when  the  counterpoise  correction  is  applied  to  the  sto-3G  results.  In  spite  of 
this  fortuitous  cancellation,  the  sto-3G  values  for  A E  (free  of  counterpoise  correc¬ 
tion)  are  remarkably  and  systematically  close  to  the  corresponding  6-3 1G**  values 
for  the  three  dimers. 

If  one  considers  the  values  of  A E  predicted  with  the  6-3 1G**  basis  set.  it  turns  out 
that  there  is  no  unique  ordering  of  the  stabilization  energies  for  the  three  isoelectronic 
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dimers.  Even  with  such  a  large  basis  set.  A E  still  depends  on  the  convention  chosen 
to  calculate  the  monomer  energies. 


Average  Electric  Polarizability 

As  indicated  in  the  Introduction,  it  is  important  to  assess  how  the  transmission  of 
conjugation  is  affected  by  the  hydrogen  bonds  in  the  bridging  cyclic  dimers.  Real 
molecules  will  be  longer  than  the  simple  dimers  studied  in  this  work  and  the  use  of 
large  basis  sets  will  be  intractable  in  these  studies.  Qualitative  trends  are  in  use  pro¬ 
vided  they  are  predicted  consistently,  and  in  view  of  our  experience  with  conjugated 
chains,  we  could  have  used  sto-3G  results  Indeed,  in  the  case  of  pure  hydrocarbon 
chains  as  well  as  chains  incorporating  oxygen  and  nitrogen|23. 51 . 52] .  the  STO-3G 
calculated  longitudinal  polarizabilities  follow  satisfactorily  the  4-3 1G  values  to  within 
a  reasonably  constant  scaling  factor,  =>=1.45.  However,  the  numerous  warnings  in  the 
literature  about  hydrogen  bonds  on  the  one  hand,  and  the  sensitivity  of  the  polariz¬ 
ability  on  the  quality  of  the  basis  function  on  the  other,  suggest  a  cautious  approach 
to  this  problem.  For  instance  Karlstrom  and  Sadlej  |53]  indicate  that  for  two  interact¬ 
ing  water  molecules,  the  basis  set  superposition  results  in  a  considerable  increase  of 
the  x  component  of  the  dipole  moment  and  the  xx  component  of  the  polarizability 
tensor  for  the  water  molecule  which  plays  the  role  of  the  electron  pair  donor  „  is  the 
direction  of  approach  of  the  molecules). 

Our  results  on  the  average  electric  polarizability  for  the  three  basis  sets  and  the 
various  molecular  arrangements  schematized  in  Figure  4  are  listed  in  Table  IV.  It  can 
be  seen  that  the  polarizability  of  the  distorted  monomers  (B.  G.  K  and  L)  is  systemati¬ 
cally  larger  than  in  the  equilibrium  geometry  (A  and  F),  but  the  largest  changes  occur 
when  the  ghost  functions  are  added  (C.  D.  H,  I.  M.  N.  O.  and  P).  A  measure  of  the 
change  in  polarizability  Aa  upon  dimerization  can  be  obtained  from  the  polarizability 
of  the  complex  minus  the  sum  of  the  polarizabilities  of  the  monomers.  Various  possi¬ 
bilities  exist  and  the  results  are  listed  in  the  table.  The  variations  of  Aa  with  respect 
to  its  various  definitions  are  similar  for  each  basis  set,  and,  from  left  to  right  in  the 
table,  A  a  decreases  systematically.  Most  important  is  that  there  is  no  significant 
departure  from  additivity  for  (a)  due  to  the  dimerization.  This  is  in  agreement  with 
the  results  by  Dykstra  and  Liu  in  their  study  of  the  electrical  properties  of  hydrogen 
fluoride  and  the  hydrogen  fluoride  dimer  [54]  for  which  very  small  changes  are  ob¬ 
served  in  the  polarizability  with  the  distance  between  the  two  monomers.  From  our 
results  it  can  be  observed  that  A  a  tends  to  decrease  when  larger  basis  sets  are  used. 
In  the  simplest  way  to  calculate  Aa,  namely,  aE-2aA,ar2aF.  and  aQ  -  (a.,  +  a,  ), 
the  ratios  of  Aa  (sro-3G)/Aa(6-31G**)  range  from  2.33  to  2.64,  and  are  acceptably 
constant.  It  can  also  be  noted  that  Aa(STO-3G)  including  superposition  corrections 
for  the  optimized  dimers  (third  column  in  the  table)  compares  reasonably  well  with 
Aa(6-31G**)  in  the  first  column. 

As  to  the  qualitative  effects  with  which  we  are  mostly  concerned,  it  can  be  con¬ 
cluded  the  polarizability  for  the  complexes  of  formic  acid  and  formamide  turns  out  to 
be  an  additive  quantity  in  the  monomer  contributions.  Irrespective  of  the  definitions 
for  Aa,  the  changes  remain  small,  the  largest  being  observed  for  the  sto-3G  results 
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Table  IV.  sto-3G,  4-3IG,  an-  6-3 IG**  average  polarizabilities  a  (au)  for  the  situations  represented  in 

Fig.  4. 


Formic  acid:  monomer  and  dimer 


a 

A 

B 

C 

D 

E 

sto-3G 

8.61 

8.78 

9.23 

9.43 

19.17 

4-3 1G 

13.57 

13.68 

14.09 

14.25 

27.42 

6-31G** 

14.75 

14.84 

15.24 

15.36 

30.24 

Ao 

«e-2*«a 

oE-2*aB 

aE-2*oc 

oE-2*aD 

sto-3G 

1.95 

1.61 

0.71 

0.31 

4-3 1 G 

0.28 

0.06 

-0.76 

-1.08 

6-3 1G** 

0.74 

0.56 

-0.24 

-0.48 

Exp.  value: 

22. 31 

Formamide 

:  monomer  a nc 

1  dimer 

a 

F 

G 

H 

1 

J 

STO-3G 

10.48 

10.61 

10.98 

11.24 

23.27 

4-3 1G 

16.66 

16.69 

17.25 

17.38 

34.25 

6-3IG** 

18.07 

18.13 

18.60 

18.67 

37.13 

Aa 

a,-2*aF 

or2*oG 

a,-2*«H 

ar2*at 

STO-3G 

2.31 

2.05 

1.31 

0.79 

4-3 1G 

0.93 

0.87 

-0.25 

-0.51 

6-31G** 

0.99 

0.87 

-0.07 

-0.21 

Exp.  Value 

:  27.53b 

Constituents  of  the  formic  acid  and  formamide  pair 


a 

K 

L 

M 

N 

O 

P 

Q 

sto-3G 

9.41 

11.23 

8.81 

9.15 

10.58 

1 1 .05 

21.20 

4-3 1G 

14.27 

17.38 

13.70 

14.09 

16.75 

17.26 

30.83 

6-31G** 

15.35 

18,69 

14.84 

15.21 

18.12 

18.62 

33.68 

A  a 

ag-(«A  +  «f) 

oQ-(aB  +  «g) 

«q-(“k  +  “l) 

“Q-(«M  +  «n) 

a0-(ao  +  Op) 

STO-3G 

2.11 

1. 81 

1.81 

1.00 

0.56 

4-3  IG 

0.60 

0.46 

0.38 

-0.52 

-0.82 

6-31G** 

0.86 

0.71 

0.72 

-0.15 

-0.36 

■Ref.  58. 
bRef.  59. 

without  superposition  corrections.  Therefore  (in  absence  of  pairs  of  strong  electron- 
donating  and  electron-accepting  groups  at  the  end  of  the  conjugated  chains  which 
could  change  the  picture),  we  may  anticipate  that  besides  the  advantage  of  partially 
blocking  the  conformational  freedom  in  the  carbon  skeleton,  the  hydrogen  bridges 
will  probably  neither  impair  nor  enhance  the  overall  polarizability  of  the  conjugated 
chains  we  are  considering  as  interesting  candidates  for  electro-optical  applications. 

Concluding  Remarks 

The  aim  of  this  study  was  to  analyze  the  influence  of  dimerization  on  the  polariz¬ 
ability  of  three  cyclic  pairs  involving  formic  acid  and/or  formamide.  The  central 
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question  was  to  determine  if  the  polarizability  strongly  deviates  from  additivity  and  in 
which  direction  (i.e.,  enhancement  or  decrease).  Strictly  speaking,  the  idea  of  com¬ 
paring  the  polarizability  of  a  dimer  to  the  polarizability  of  the  contributing  monomers 
is  questionable  because  of  the  structural  and  electronic  changes  taking  place  during 
the  dimerization  as  well  as  difficulties  in  describing  the  partner  systems  at  compa¬ 
rable  levels  of  rigor.  To  minimize  these  influences  we  have  examined  different  pos¬ 
sibilities  for  the  polarizability  changes.  The  main  results  of  this  study  are  as  follows: 

1  Geometry  relaxations  in  the  dimers  are  very  sensitive  to  the  basis  set  extension 
and  still  significant  when  going  from  the  6-3 1G*  to  the  6-31G**  basis.  Direct  com¬ 
parison  of  the  structural  parameters  of  the  isolated  dimers  with  experimental  data 
do  not  seem  to  be  appropriate  due  to  hydrogen  bonds  interlinking  the  dimers  in  the 
solid  state. 

2  Qualitative  changes  in  polarizability  upon  dimerization  can  be  accounted  for  on 
the  basis  of  an  additivity  scheme.  This  is  true  for  the  various  definitions  used  to  esti¬ 
mate  these  changes. 

On  the  basis  of  the  present  results,  it  is  reasonable  —  in  absence  of  pairs  of  strong 
donor  and  acceptor  groups  incorporated  in  the  conjugated  system  —  to  anticipate  that 
no  significant  influence  (adverse  and/or  favorable)  will  result  on  the  overall  polariz¬ 
ability  contributions  of  conjugated  chains  connected  by  cyclic  hydrogen-bonded  pairs. 
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Abstract 

A  new  definition  of  molecular  similarity  in  terms  of  electron  density  is  proposed  and  a  method  for  calcu¬ 
lating  similarity  based  on  molecular  electrostatic  potential  and  molecular  electric  field  introduced.  It  is  ap¬ 
plied  to  some  simple  isosteres. 


Introduction 


Replacement  of  a  functional  group  in  a  drug  molecule  gives  rise  to  a  change  in  the 
biological  activity.  Chemists  use  the  concept  of  bioisosterism  1 1 1  to  attempt  to  predict 
how  successful  a  particular  substitution  might  be. 

This  idea  may  be  quantified  theoretically  by  comparing  the  electron  densities,  pA 
and  pB,  of  two  molecules  A  and  B,  and  calculating  an  index  of  similarity,  RAB 
[Eq.  (I)],  as  first  introduced  by  Carbo  et  al.  (2J. 


Rar  ~~ 


( 


Pa  Pb  dv 


(fp>)  (/ 


pBdv 


(1) 


with  the  integrations  being  over  all  space. 

The  denominator  is  a  normalizing  constant  and  R ^  varies  in  the  range  0  to  1 .  Such 
an  index  of  similarity  is  required  to  have  a  value  of  1  when  the  electron  density  dis¬ 
tributions  in  the  two  molecules  are  identical.  However,  substitution  of  pA  =  np„  into 
Eq.  (1),  where  n  is  a  constant  gives  an  index  of  unity.  Thus  the  Carbo  index  repre¬ 
sents  the  similarity  of  the  shapes  of  the  density  distributions  but  not  of  the  magni¬ 
tudes  as  well.  This  formula  has  been  applied  to  electron  density  in  a  variety  of  ways 
(3—5 1 . 

We  now  propose  an  alternative  definition  of  molecular  similarity,  HAB  [Eq.  (2)). 


Hab  = 


'S 


2  PaPb  dv 


I  P\dv  +  j  p\dv 


(2) 
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Ha  can  take  values  between  0  and  1.  Substitution  of  pA  =  npB  gives  a  value  for 
Hm  of2n/(\  +  n2).  Thus  the  formula  gives  a  total  similarity  of  both  shape  and  mag¬ 
nitude  of  the  density  distribution. 

Similarity  of  Potential  and  Field 

The  calculation  of  molecular  similarity  need  not  however  be  restricted  to  electron 
density.  Both  styles  of  index  can,  in  principle,  be  applied  to  any  property  of  a 
molecule.  Molecular  electrostatic  potential  (mep)  and  molecular  electric  field  (mef) 
are  well  suited  to  this  approach  although  since  these  are  signed  quantities  the  values 
will  run  from  - 1  to  + 1 .  Molecular  electrostatic  potential  is  widely  used  in  tech¬ 
niques  of  molecular  design  and  structure-activity  relations  (qsar)  studies.  The  mef  is 
less  frequently  used,  largely  as  a  result  of  the  difficulty  of  displaying  a  vector  quan¬ 
tity  in  three  dimensions,  but  it  is  important  because  the  scalar  product  of  the  field  and 
a  dipole  gives  the  energy  of  the  dipole  at  a  given  point  [6].  Dipolar  interactions  are 
important  in  ligand-macromolecule  binding  and  in  solvation. 

The  use  of  the  index  is  particularly  important  for  calculating  the  mep  and  mef 
similarity  because  these  properties  may  be  of  similar  shape  for  a  pair  of  molecules, 
while  their  absolute  values  are  critical  importance.  Both  properties  can  be  calculated 
using  Mulliken  population  charges  [7]  from  a  suitable  molecular  orbital  package  such 
as  mopac  [8],  The  integration  is  performed  numerically  over  a  three-dimensional 
grid.  While  mep  values  are  simply  multiplied  at  each  grid  point,  the  product  of  mef 
vectors  is  taken  as  the  scalar  product.  The  definition  of  the  grid  (i.e.,  extent  and  den¬ 
sity)  is  important.  It  was  found  that  a  three-dimensional  grid  extending  10  A  around 
the  molecules  on  all  sides  with  grid  points  1  A  apart  was  sufficient  to  approach  very 
close  to  the  limit  of  the  similarity  (within  1  %  of  the  similarity  obtained  using  a  very 
large,  finely  meshed  grid)  for  both  mep  and  mef.  In  addition,  this  grid  definition  al¬ 
lows  a  calculation  to  be  performed  in  a  reasonable  amount  of  computer  time.  The 
value  of  the  similarity  index  is  virtually  insensitive  to  movement  of  the  grid. 

Clearly,  there  is  a  need  to  exclude  the  volume  contained  within  the  molecules  in 
some  way  so  as  to  avoid  singularities.  It  was  decided  that  the  molecular  volumes 
should  be  defined  by  the  Van  der  Waals  surfaces  and  that  the  volume  lying  within 
both  molecules  should  be  excluded  altogether  from  the  calculation.  This  problem  is 
tackled  by  allocating  values  of  zero  to  the  mep  or  mef  at  grid  points  inside  the 
molecule.  This  is  illustrated  by  Figure  1.  The  shaded  area  (AUB)  is  inside  both 

A 


Figure  1.  Intersection  of  the  Van  der  Waals  surfaces  of  two  superimposed  molecules. 
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molecules  and  does  not  contribute  to  the  similarity.  The  remaining  areas  (A  fl  B  and 
B  fl  A)  contribute  only  to  the  denominator  of  Eq.  (2).  The  rationale  behind  exclud¬ 
ing  the  Van  der  Waals  volume  to  twofold.  First,  the  point-charge  approximation  of 
mep  and  mef  gives  only  realistic  numbers  outside  the  Van  der  Waals  surface  [9],  and 
second,  the  volume  within  the  molecule  is  of  no  relevance  to  interactions  with  the 
environment. 


Superimposition 

In  addition  to  supplying  information  about  similarity,  the  index  can  be  used  as  a 
criterion  for  superimposing  molecules  in  the  optimum  fashion  so  as  to  permit  infer¬ 
ences  about  receptor  topology.  A  variety  of  methods  were  tried  for  maximizing  the 
mep  and  mef  similarity.  The  most  reliable  of  these  proved  to  be  a  routine  from  the 
NAG  mathematical  library  (E04CCF)  employing  the  Simplex  method  [10].  Initially 
the  two  molecules  may  be  superimposed  in  any  relative  orientation.  One  of  the 
molecules  is  given  complete  freedom  of  motion  in  three  dimensions  and  allowed  to 
move  toward  the  position  at  which  the  similarity  is  a  maximum.  Typically  the  Sim¬ 
plex  method  finds  the  maximum  after  300  calculations  of  the  index  and  consequently, 
is  fairly  costly  in  computer  time.  However,  the  method  is  robust  in  that  a  maximum 
is  always  reached  which  is  independent  of  the  starting  point. 

A  series  of  bioisosteres,  Me2CH2,  Me20,  Me2S,  has  been  the  subject  of  some  elec¬ 
tron  density  similarity  calculations  [5, 11).  The  mep  and  MEF  methods  have  been  ap¬ 
plied  to  this  series.  Atomic  charges  were  computed  using  mopac.  The  results  are 
shown  in  Tables  I  and  II.  The  first  set  of  values  in  each  table  shows  the  similarity  of 
each  pair  of  molecules  when  they  are  aligned  along  the  same  principal  axis  with  the 


Table  I.  Resulls  ot  mep  similarity  calculations  on  propane,  dimethyl  ether,  and 

dimethyl  thioether. 


(i)  Superimposition 
of  central  atoms 

(ii)  Position  of 
maximum  similarity 

Rab 

Hab 

Raj, 

Hab 

Me,CH2/Me20 

0.67 

mm 

mm 

Me2CH2/Me2S 

lEmStl 

Me20/Me2S 

0.92 

0.90 

0.92 

Table  II.  Results  of  mef  similarity  calculations  on  propane,  dimethyl  ether,  and 

dimethyl  thioether. 

(0  Superimposition 

(ii)  Position  of 

of  central  atoms 

maximum  similarity 

Rab 

*AB 

Ha„ 

Me2CH2/Me20 

0.30 

0.017 

0.33 

B|» 

Me2CH2/Me2S 

0.21 

0.016 

0.32 

■ESI 

Me2Q/Me2S 

0.81 

0.78 

0.85 

0.83 
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central  atoms  superimposed.  The  striking  feature  in  both  tables  of  values  is  the  differ¬ 
ence  between  the  Carbo  and  new  formulae.  In  the  comparisons  involving  propane  the 
two  indices  differ  greatly.  This  reflects  the  magnitudes  of  the  atomic  charges  in 
propane  which  are  very  small  compared  with  those  in  dimethyl  ether  and  dimethyl 
thioether.  For  example  the  central  carbon  in  propane  has  a  charge  of  -0.05,  the  sul¬ 
fur  of  dimethyl  thioether  —0.20  and  the  oxygen  of  dimethyl  ether  —0.34.  However, 
the  pattern  of  atomic  charges  in  each  molecule  is  similar  and  consequently  their  meps 
and  mefs  are  similarly  shaped.  Thus  it  is  possible  for  the  Carbo  index  to  take  an  unre¬ 
alistically  high  value.  The  Me20/Me2S  comparison  gives  good  agreement  between 
the  Carbo  and  H ^  indices,  reflecting  the  real  similarity  between  their  mep  and  mef 
distributions.  The  second  set  of  values  in  each  table  shows  the  result  of  the  similarity 
being  maximized.  The  relative  positions  of  the  molecules  after  this  maximization  are 
not  shown.  However,  they  do  not  move  far  from  the  position  in  which  the  central 
atoms  are  overlapping  (typically  0.2  A).  It  is  interesting  to  note  that  in  all  cases  the 
molecules  come  to  rest  in  different  planes. 


Figure  2.  Hodgkin  similarity  indices  for  the  comparisons:  Me;CH,/Me;0  (MEP: 
MEF: - )  and  Me2CH2/Me2S  (MEP:  ■ — ,  MEF:  . ). 
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In  addition,  we  observed  the  effect  of  moving  the  molecules  relative  to  one  another 
along  the  principal  axis.  The  behavior  of  the  mep  and  mef  similarities  was  observed 
for  propane/dimethyl  ether,  propane/dimethyl  thioether  (Fig.  2)  and  dimethyl 
ether/dimethyl  thioether  (Fig.  3)  using  the  index.  In  general,  the  mef  similarity 
decreases  more  rapidly  than  the  mep  similarity,  as  the  molecuies  are  moved  apart. 
This  is  expected  because  the  electric  field  itself  decays  more  rapidly  than  the  electro¬ 
static  potential. 

The  use  of  mep  and  mef  similarity  calculations  is  very  attractive  in  qsar  studies, 
given  the  importance  that  medicinal  chemists  attach  to  these  properties.  Previously 
{ 12]  molecular  shape  has  been  shown  to  be  a  good  descriptor  in  terms  of  correlations 
with  activity.  This  work  extends  that  idea  so  as  to  include  electronic  effects.  It  is 
hoped  that  the  techniques  described  will  be  used  to  study  bioactive  molecules  in  order 
to  relate  similarity  to  biological  activity  or  toxicity. 


Figure  3.  Hodgkin  similarity  indices  for  the  comparison:  Me20/Me,S  (MEP: 

MEF; - ). 
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Abstract 

Cumulative  atomic  multipole  moments  (camm)  have  been  calculated  for  normal,  rare,  and  protonated 
forms  of  adenine,  thymine,  guanine,  cytosine,  uracil,  and  2-aminopurine  from  ab  initio  lcao-mo-scf  wave 
function  obtained  from  all-valence  modpot  basis  set  with  ah  initio  effective  core  potentials,  camm  may  be 
used  in  calculating  electrostatic  molecular  potentials,  electric  fields,  field  gradients,  etc.  as  well  inter- 
molecular  interaction  energies.  Additionally,  we  derived  analytic  expressions  for  the  point  charge  assem¬ 
blages  representing  simultaneously  all  atomic  and  molecular  momen's.  Convergence  of  atomic  versus 
molecular  multipoie  expansion  has  been  illustrated  in  the  Appendix. 


Introduction 

There  is  no  general  consensus  as  to  how  to  define  atomic  charges  in  an  unambiguous 
way.  Fortunately,  its  arbitrariness  and  basis  set  dependence  could  be  compensated  by 
including  higher  terms  in  atomic  multipole  expansion,  at  least  up  to  quadrupole 
moment  |1J.  Formally,  all  available  multicenter  multipole  expansions  [1-11]  are 
equivalent,  and  practically,  they  can  differ  only  in  number  of  expansion  centers,  con¬ 
vergence,  and  complexity.  The  cumulative  atomic  multipole  expansion  [1]  employed 
in  this  study  is  perhaps  one  of  the  most  straightforward  in  practical  implementation 
and  could  be  regarded  as  natural  supplement  to  Mulliken  population  analysis  [  12].  Its 
convergence  has  been  illustrated  in  the  Appendix.  The  electrostatic  multipole  term 
calculated  directly  from  atomic  multipole  moments  constitutes  the  most  specific  and 
orientation-dependent  contribution  to  the  intermolecular  interaction  energy  for  polar 
or  ionic  systems.  This  allowed  successful  prediction  of  the  detailed  structure  of  small 
hydrogen-bonded  dimers  [13-17]  as  well  other  van  der  Waals  complexes  involving 
aromatic  molecules  [18],  The  remaining  components  of  intermolecular  interaction 
energy  are  more  transferrable  and  could  be  represented  by  more  or  less  universal 
atom-atom  potentials  derived  from  ab  initio  calculations  [19-21],  Therefore,  further 
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applications  of  the  above  mentioned  atom-atom  potentials  to  biologically  important 
systems  (nucleic  acids,  proteins,  enzymes,  etc.)  will  require  the  knowledge  of  atomic 
multipoles  for  the  corresponding  building  blocks,  namely  nucleic  bases  and 
aminoacids. 

Despite  the  significant  interest  in  studies  on  properties  and  interactions  of  nucleic 
acid  bases,  the  only  explicitly  published  set  of  atomic  multipole  moments  has  been 
obtained  within  the  semiempirical  if.ht  approach  |2).  Therefore,  the  ab  initio  atomic 
multipoles  presented  in  this  contribution  constitute  an  indispensable  supplement  to 
nonempirical  atom-atom  potentials  [20]  enabling  calculation  of  intermolecular  inter¬ 
action  energy  and  its  components  as  well  studying  other  properties  of  DNA  bases 
available  from  multipole  expansion.  Besides  normal  nitrogen  DNA  bases  our  calcula¬ 
tions  have  been  extended  to  include  the  corresponding  protonated  and  rare  forms  as 
well  as  2-aminopurine  analogs.  Their  interactions  within  DNA  may  be  responsible 
for  the  observed  sequence-specific  mutation  “hot  spots”  and  will  be  the  subject  of  our 
forthcoming  study. 

In  more  simplified  models,  the  anisotropy  of  local  charge  distribution  is  frequently 
accounted  for  by  locating  additional  point  charges  outside  atomic  nuclei.  However, 
previously  introduced  models  of  this  kind  [22—3 1  ]  have  been  obtained  in  an  arbitrary 
and  unsystematic  manner,  either  by  fitting  point  charge  values  and  locations  to  ap¬ 
proximate  molecular  multipole  moments  [23-25]  or  molecular  electrostatic  potentials 
[26-31],  So  the  above  mentioned  models  [22-31]  depend  on  the  arbitrary  choice  of 
number,  location,  and  value  of  point  charges.  While  reproducing  some  global  proper¬ 
ties  it  does  not  always  correctly  represent  the  anisotropy  of  local  charge  distributions. 
Sometimes  such  a  procedure  even  leads  to  unphysically  large  point  charges  |28|. 
With  the  aid  of  camm  we  are  in  a  position  to  construct  analytically  point  charge 
models  of  desirable  precision  in  a  much  more  systematic  manner.  In  contrast  to  meth¬ 
ods  based  on  global  criteria  [22-31],  our  model  allows  the  preservation  of  the  local 
anisotropies  of  charge  distribution  represented  by  atomic  multipoles  and  all  corre¬ 
sponding  molecular  moments. 


Cumulative  Atomic  Multipole  Moments 

Theoretical  molecular  multipole  moments  are  determined  within  the  lcao  mo  set- 
approach  as  expectation  values  of  the  operator  ttVir*  ( u.v.w  =  x.y.z) 

atoms  AO  AO 

(«Vh'">  =  X  -  2  2p,/</|«Vh--u> 


=  2  (ukv'w"), 


in 


where  Z,  denotes  nuclear  charge,  (I  |  utrlwm\J)  one  electron  multipole  moment  inte¬ 
gral  and  Pu  density  matrix  element.  Transforming  each  atomic  multipole  moment 
<m*  v'  wm),  to  a  local  coordinate  system  with  origin  at  the  i-th  atomic  center  (n„  v,,  wt) 
we  obtain  cumulative  atomic  multipole  moments  (camm)  Mklm 


MT  =  <«vo  -2  2  2  (M  (' )  ("0 

*  >0  l  >0  m  >0  \*  /  V  /  V"  / 


(2) 


klm  *■  k'l'm' 
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In  this  formulation  A/®00  is  equivalent  to  the  Mulliken  atomic  charge  [12].  Each  of  the 
higher  multipoles  corresponds  to  contributions  not  included  in  the  lower  moments. 
Cumulative  character  of  the  consecutive  moments  of  (k  +  I  +  m)th  order  allows  the 
stepwise  refinement  of  local  charge  anisotropy  and  multipole  expansion.  Cumulative 
second-order  moments  Mfm  (k  +  /  +  m  =  2)  can  be  additionally  transformed  into 
traceless  quadrupole  tensors  Q“ 

Qr  =  0.5(3 AC*  -  K  £  M?)  (n,v,w  =  x,y,z)  (3) 

f 

The  molecular  wavefunctions  of  nucleic  acid  bases  have  been  calculated  within 
ab  initio  scf-lcaomo  method  using  the  minimal  all-valence  modpot  basis  set 
(3s3p/3s)  — *  [  Is  Ip/ls'l  with  the  ab  initio  effective  core  model  potentials  [32].  This 
well  balanced  basis  set  yielded  for  several  hydrogen-bonded  complexes  intermolecu- 
lar  interaction  energies  very  close  to  results  obtained  in  extended  6-3 1G*  and 
6-31  G**  basis  sets  [33].  It  must  be  noted  that  modpot  results  available  at  a  fraction 
of  the  cost  compare  favorably  with  those  obtained  in  more  expensive  basis  sets 
[33,34]  which  underestimate  molecular  dipole  moments  and  intermolecular  interac¬ 
tions.  The  sample  cartesian  coordinates  (X,  K.Z).  atomic  monopoles  (()).  atomic 
dipoles  ( DX.DY.DZ ).  and  traceless  atomic  quadrupoles  (QXX.QYY.QZZ..QXY.QXY. 
QYZ)  for  2-aminopurine  have  been  presented  in  the  Table  I.  Analogous  data  for 
normal,  rare,  and  protonated  forms  of  adenine,  guanine,  uracil,  cytosine,  and 
thymine  (Figs.  I  and  2)  are  available  on  request  [either  in  printed  form  or  on  one 
DSDD  5.25"  floppy  diskette  mailed  to  one  of  authors  (WAS)]. 

Analytic  Point  Charge  Representation  of  Atomic  and 
Molecular  Multipoles 

Each  multipole  can  be  represented  by  a  minimal  set  of  n  properly  arranged  point 
charges  qp 

n 

Mk!m  ~  X  ~  -  U',f  (4) 

r  i 

There  is  no  a  priori  way  of  determining  the  values  (qr)  and  locations  (m,„  \;„  »-,)  of 
the  point  charges  without  additional  constraints.  In  this  study  we  introduce  only  one 
quite  natural  assumption:  that  the  charge  located  at  nucleus  (m,.v, .»»•,)  is  equal  to  the 
corresponding  atomic  core  charge.  In  the  case  of  modpot  basis  set.  the  core  charge 
equals  the  number  of  valence  electrons  of  corresponding  atom.  To  preserve  this  con¬ 
dition  all  other  off-nuclear  charges  have  to  be  located  on  a  sphere  of  radius  R 

R,  =  V-TriT.M.TDHZ,  -  M™')  (5) 

where  Tr(T,M,TD  stands  for  the  trace  of  the  (diagonal)  atomic  second-moment  ten¬ 
sor  M "  transformed  by  rotation  matrix  T,  into  the  corresponding  principal  axes. 

Table  II  gives  analytic  expressions  for  point  charge  values  q  and  their  loca¬ 
tions  (x r,yp,zr)  for  all  atomic  and  molecular  multipole  moments  up  to  monopole 
(k  +  /  +  m  =  n  =  0),  dipole  (»  =  l).  or  quadrupole  ( n  =  2).  Trivially,  the  first 
expansion  (n  -  0)  is  equivalent  to  the  Mulliken  population  analysis.  The  second 
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Figure  1.  Pyrimidine  nucleic  acid  bases  (B)  and  their  protonated  (BH*)  and  rare  tau¬ 
tomeric  forms  (B*).  (a)  cytosine;  (b)  thymine;  (c)  uracil. 

model  («  =  1)  reproducing  atomic  and  molecular  moments  has  been  utilized  in  our 
earlier  calculations  [35, 36).  Equivalent,  but  apparently  more  perplexed  point  charge 
models  preserving  molecular  dipole  moments  have  been  proposed  elsewhere  [24, 37], 
However,  our  later  calculations  including  also  quadrupole  and  octopole  moments 
[1,33,34]  indicate  that  atomic  quadrupole  moments  play  an  essential  role  in 
the  description  of  intermolecular  interactions  and  other  properties.  Therefore,  we 
recommend  here  the  use  of  at  least  quadrupole  model  (n  =  2).  The  importance  of 
quadrupole  contribution  can  be  judged  directly  by  comparison  of  magnitudes  of  point 
charge  values  derived  from  atomic  dipoles  (p  =  2,3)  and  quadrupoles  ( p  = 
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Figure  2.  Purine  nucleic  acid  bases  (B)  and  their  protonated  (BH*)  and  rare  tautomeric 
forms  <B*).  (a)  adenine;  <b)  guanine;  (c)  2-aminopurine. 
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Table  II.  Point  charge  values  q  and  their  locations  up(  up=  x,y,  2)  equivalent  to  atomic  and  molecular 
multipole  expansions  truncated  at  monopole  (n  -  0),  dipole  (n  =  1),  and  quadrupole  (n  =  2)  level. 


Expansion 

Point 

Point  charge 

Point  charge 

level 

charge 

value 

location 

n 

P 

?/> 

Up 

0 

1 

A #" 

U, 

1 

1 

Mr 

u , 

2 

|/t,i/(2R<) 

u,  -  Mr00*,/ |p,i 

3 

-|p, 1/(2*,) 

u,  +  M“00*,/k -1 

2 

1 

z 

Mr 

2 

|p,  1/(2*,) 

u,  -  Mr°°*,/|p,i 

3 

-Ip, 1/(2*,) 

«,  +  Mr00*,/|p,i 

4 

M)'/(  2*f) 

11,  -  77' R, 

5 

M]'/(2R]) 

u,  +  T“'  R, 

6 

Mp/(2*f) 

11,  -  772*, 

7 

Mf7(2*f) 

u,  +  TfR, 

8 

M,”/(  2*f) 

ii,  -  TfR, 

9 

M;'  7(2*?) 

u,  +  TfR, 

p,  =  yjlW?*)2 

M"  =  T.MfT * 

4, 5,6, 7, 8,9).  Typically,  the  latter  are  about  an  order  of  magnitude  larger  than  the 
former.  The  corresponding  point  charge  representation  of  molecular  and  atomic  mul¬ 
tipoles  for  2-aminopurine  is  presented  in  Table  III  at  quadrupole  level  (n  =  2).  For 
each  atom  the  corresponding  cartesian  coordinates  and  values  of  9  point  charges  have 
been  given  there.  NUCLEI  represents  the  integer  charge  of  atomic  core  (p  =  1), 
DIPOLE:  two-point  charges  (p  -  2,3)  representing  atomic  dipole  and  QUADRU¬ 
POLE:  six-point  charges  ( p  =  4,  5, 6,  7,  8, 9)  representing  atomic  quadrupole 
moment.  The  lower  level  representations  ( n  =0,1)  may  be  easily  obtained  from  the 
data  presented  in  this  paper. 

Although  such  point  charge  models  are  inferior  in  respect  to  the  corresponding 
atomic  multipole  models  (due  to  overlapping  atomic  spheres),  they  yield  valuable 
conceptual  information.  They  allow  one  to  illustrate  directly  the  relationship  between 
multipoles  and  various  chemical  concepts  of  bonding,  arrangement  of  lone  electron 
pairs,  etc.  In  addition,  it  is  evident  that  models  with  smaller  number  of  point  charges 
may  be  deficient.  The  point  charge  representations  for  the  remaining  molecules  are 
available  on  request  [either  in  printed  form  or  on  one  DSDD  5.25"  floppy  diskette 
mailed  to  one  of  the  authors  (WAS)]. 
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Table  III.  Point  charge  representation  2-aminopurine  (amino  form). 


Atom 

A'(au) 

T(au) 

Z(au) 

Charge(au) 

N(l) 

.000000 

.000000 

.000000 

5.000000 

NUCLEI 

N(l) 

-1.238989 

.000000 

-.886874 

-.151723 

DIPOLE 

N(l) 

1.238989 

.000000 

.886874 

.151723 

DIPOLE 

N(l) 

1.294647 

.000000 

.803447 

-.939526 

QUADRUPOLE 

N(l) 

-1.294647 

.000000 

-.803447 

-.939526 

QUADRUPOLE 

N(l) 

.000000 

-1.523693 

.000000 

-.921877 

QUADRUPOLE 

N(l) 

.000000 

1.523693 

.000000 

-.921877 

QUADRUPOLE 

N(l) 

-.803447 

.000000 

1.294647 

-.818018 

QUADRUPOLE 

N(l) 

.803447 

.000000 

-1.294647 

-.818018 

QUADRUPOLE 

C(  2) 

.000000 

.000000 

2.551131 

4.000000 

NUCLEI 

C(2) 

- 1 .630809 

.000000 

2.391209 

-.021537 

DIPOLE 

C(2) 

1.630809 

.000000 

2.711052 

.021537 

DIPOLE 

C(2) 

.000000 

-1.638631 

2.551131 

-.731023 

QUADRUPOLE 

C(2) 

.000000 

1.638631 

2.551131 

-.731023 

QUADRUPOLE 

C(2) 

-.917801 

.000000 

1.193650 

-.599961 

QUADRUPOLE 

C(2) 

.917801 

.000000 

3.908611 

-  599961 

QUADRUPOLE 

C(2) 

-1.357481 

.000000 

3.468931 

-.470505 

QUADRUPOLE 

C(2) 

1.357481 

.000000 

1 .633330 

- .470505 

QUADRUPOLE 

N(3) 

2.067806 

.000000 

3.996336 

5.000000 

NUCLEI 

N(3) 

2.010491 

.000000 

5.527925 

-.149810 

DIPOLE 

N(3) 

2.125121 

.000000 

2.464747 

.149810 

DIPOLE 

N(3) 

2.040694 

.000000 

2.463915 

-.935336 

QUADRUPOLE 

N(3) 

2.094918 

.000000 

5.528758 

-.935336 

QUADRUPOLE 

N(3) 

2.067806 

-1.53266! 

3.996336 

-.929317 

QUADRUPOLE 

N(3) 

2.067806 

1.532661 

3.996336 

-.929317 

QUADRUPOLE 

N(3) 

.535385 

.000000 

4.023448 

-.832141 

QUADRUPOLE 

N(3) 

3.600228 

.000000 

3.969224 

-.832141 

QUADRUPOLE 

C(4) 

4.212948 

.000000 

2.730733 

4.000000 

NUCLEI 

C(4) 

2.846673 

.000000 

1.822821 

-.032695 

DIPOLE 

C(4) 

5.579223 

.000000 

3.638646 

.032695 

DIPOLE 

C(4) 

4.212948 

-1.640431 

2.730733 

-.777406 

QUADRUPOLE 

C(4) 

4.212948 

1.640431 

2.730733 

-.777406 

QUADRUPOLE 

C(4) 

4.084835 

.000000 

1.095313 

-.648090 

QUADRUPOLE 

C(4) 

4.341061 

.000000 

4.366154 

-.648090 

QUADRUPOLE 

C(4) 

2.577527 

.000000 

2.858846 

-.449250 

QUADRUPOLE 

C(4) 

5.848368 

.000000 

2.602620 

-.449250 

QUADRUPOLE 

C(5) 

4.422072 

.000000 

.102873 

4.000000 

NUCLEI 

C(5) 

2.895037 

.000000 

-.628821 

-.078135 

DIPOLE 

C(5) 

5.949108 

.000000 

.834567 

.078135 

DIPOLE 

C(5) 

4.422072 

1.693285 

.102873 

-.842947 

QUADRUPOLE 

C(5) 

4.422072 

-1.693285 

.102873 

-.842947 

QUADRUPOLE 

C(5) 

3.965275 

.000000 

1 .733379 

-.662146 

QUADRUPOLE 

C(5) 

4.878870 

.000000 

-1.527633 

-.662146 

QUADRUPOLE 

C(5) 

2.791566 

.000000 

-.353924 

-.503586 

QUADRUPOLE 

C(5) 

6.052578 

.000000 

.559670 

-.503586 

QUADRUPOLE 

C(6) 

2.180201 

.000000 

-1.221910 

4.000000 

NUCLEI 

C(6) 

2.492399 

.000000 

-3.004560 

-.186526 

DIPOLE 

C(6) 

1.868002 

.000000 

.560740 

.186526 

DIPOLE 
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Table  III. 

( Continued . ) 

Atom 

X(au) 

F(au) 

Z(au) 

Charge!  au) 

C(6) 

1. 784971 

.000000 

-2.988008 

-.728909 

QUADRUPOLE 

C(6) 

2.575430 

.000000 

.544188 

-.728909 

QUADRUPOLE 

C(6) 

2.1 80201 

-1.809781 

-1.221910 

-  .685634 

QUADRUPOLE 

C(6) 

2.180201 

1.809781 

-1.221910 

-  .685634 

QUADRUPOLE 

C(6) 

.414103 

.000000 

-.826681 

-  .620877 

QUADRUPOLE 

C(6) 

3.946299 

.000000 

-1.617139 

-.620877 

QUADRUPOLE 

N(7) 

6.983974 

.000000 

-.579754 

5.000000 

NUCLEI 

N(7) 

7.632219 

.000000 

-1.969702 

-.156017 

DIPOLE 

N(7) 

6.335729 

.000000 

.810194 

.156017 

DIPOLE 

N(7) 

6.609711 

.000000 

.907561 

-.899200 

QUADRUPOLE 

N(7) 

7.358237 

.000000 

-2.067069 

-.899200 

QUADRUPOLE 

N(7) 

6.983974 

1.533681 

-.579754 

-.897116 

QUADRUPOLE 

N(7) 

6.983974 

-1.533681 

-.579754 

-.897116 

QUADRUPOLE 

N(7) 

5.496659 

.000000 

-.954017 

-.864146 

QUADRUPOLE 

N(7) 

8.471289 

.000000 

-.205491 

-.864146 

QUADRUPOLE 

C(8) 

8.198682 

.000000 

1 .485707 

4.000000 

NUCLEI 

C(8) 

9.949063 

.000000 

1.450991 

-.256749 

DIPOLE 

C(8) 

6.448301 

.000000 

1.520423 

.256749 

DIPOLE 

C(8) 

8.198682 

-1.750725 

1.485707 

-.718968 

QUADRUPOLE 

C(8) 

8.198682 

1.750725 

1 .485707 

-.718968 

QUADRUPOLE 

C(8) 

9.857441 

.000000 

2.045668 

-  .657986 

QUADRUPOLE 

C(8) 

6.539922 

.000000 

.925747 

-  .657986 

QUADRUPOLE 

C(8) 

7.638721 

.000000 

3.144467 

-.603617 

QUADRUPOLE 

C(8) 

8.758642 

.000000 

-.173052 

-.603617 

QUADRUPOLE 

N(9) 

6.612196 

.000000 

3.591045 

5.000000 

NUCLEI 

N(9) 

7.061668 

.000000 

5.065895 

-.128566 

DIPOLE 

N(9) 

6. 162724 

.000000 

2  116195 

.128566 

DIPOLE 

N(9) 

6  612196 

-1.541819 

3.591045 

-.991587 

QUADRUPOLE 

N(9) 

6.612196 

1.541819 

3.591045 

-.991587 

QUADRUPOLE 

N(9) 

6.607928 

.000000 

2.04923! 

-.890461 

QUADRUPOLE 

N(9) 

6.616465 

.000000 

5.132858 

-.890461 

QUADRUPOLE 

N(9) 

5.070383 

.000000 

3.595313 

-  ,866586 

QUADRUPOLE 

N(9) 

8  154010 

.000000 

3.586776 

-.866586 

QUADRUPOLE 

H(6) 

2.129035 

.000000 

-3.243270 

1.000000 

NUCLEI 

H(6) 

2.071250 

.000000 

- 1.853238 

-.034938 

DIPOLE 

H(6) 

2.186819 

.000000 

-4.633301 

.034938 

DIPOLE 

H(6) 

2.129035 

1.391232 

-3.243270 

-.138669 

QUADRUPOLE 

H(6) 

2.129035 

-1.391232 

-3.243270 

-.138669 

QUADRUPOLE 

H(6) 

3.515785 

.000000 

-3.354842 

-.115404 

QUADRUPOLE 

H(6) 

.742284 

.000000 

-3.131697 

-.115404 

QUADiv^  POLE 

H(6) 

2.017462 

.000000 

-4.630021 

-.109141 

QUADRUPOLE 

H(6) 

2.240607 

.000000 

-1.856519 

-.109141 

QUADRUPOLE 

H(9) 

7.079802 

.000000 

5.412250 

1.000000 

NUCLEI 

H(9) 

6.974798 

.000000 

4.131721 

-.032143 

DIPOLE 

H<9) 

7.184806 

.000000 

6.692779 

.032143 

DIPOLE 

H(9) 

7.079802 

1.284826 

5.412250 

-.121817 

QUADRUPOLE 

H(9) 

7.079802 

-1.284826 

5.412250 

-.121817 

QUADRUPOLE 

H(9) 

8.255218 

.000000 

4.893426 

-.106723 

QUADRUPOLE 
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Table  III.  ( Continued .) 


Atom 

X(au) 

F(au) 

Z(au) 

Charge!  aul 

H<9> 

5.904387 

.000000 

5.931074 

-.106723 

QUADRUPOLE 

H(9) 

6.560978 

.000000 

4.236835 

-  .082604 

QUADRUPOLE 

H<9) 

7.598626 

.000000 

6.587665 

- .082604 

QUADRUPOLE 

H(8) 

10.198180 

.000000 

1 .678589 

1.000000 

NUCLEI 

H(8) 

8.773540 

.000000 

1 .550927 

-.042333 

DIPOLE 

H(8) 

1 1 .622820 

.000000 

1.806251 

.042333 

DIPOLE 

H<8) 

10.198180 

1.430347 

1 .678589 

-.134121 

QUADRUPOLE 

H(8) 

10.198180 

-1.430347 

1 .678589 

-.134121 

QUADRUPOLE 

H(8) 

9.835962 

.000000 

3.062313 

-.116637 

QUADRUPOLE 

H(8) 

10.560400 

.000000 

.294865 

-.116637 

QUADRUPOLE 

H(8) 

8.814455 

.000000 

1.316372 

-.113558 

QUADRUPOLE 

H(8) 

11.581900 

.000000 

2.040806 

-.113558 

QUADRUPOLE 

N(2) 

-2.283915 

.000000 

3.662111 

5.000000 

NUCLEI 

N(2) 

-3.542500 

.000000 

4.616081 

-.113837 

DIPOLE 

N(2) 

- 1  025329 

.000000 

2.708142 

.113837 

DIPOLE 

N(2» 

-2.283915 

-1.579271 

3.662111 

-.983501 

QUADRUPOLE 

N(2) 

-2.283915 

1.57927! 

3.662111 

- .983501 

QUADRUPOLE 

N(2) 

-3.176697 

.000000 

2.359408 

-  978224 

QUADRUPOLE 

N(2) 

-1. 391132 

.000000 

4.964814 

-.978224 

QUADRUPOLE 

N(2) 

-3.586618 

.000000 

4.554893 

-.862997 

QUADRUPOLE 

N(2) 

-.981212 

.000000 

2.769328 

-.862997 

QUADRUPOLE 

H(N2I ) 

-2.116560 

.000000 

5.533598 

1  000000 

NUCLEI 

H(N2I ) 

-3.091908 

.000000 

4.762494 

-  .030835 

DIPOLE 

H(N21 1 

-1.141212 

.000000 

6.304702 

.030835 

DIPOLE 

H(N2I ) 

-2.116560 

-1.243345 

5.533598 

-.136044 

QUADRUPOLE 

M(N2I) 

-2.116560 

1  243345 

5.533598 

-  136044 

QUADRUPOLE 

H(N2I ) 

-.913016 

.000000 

5,845670 

-.122055 

QUADRUPOLE 

H(N2Ip 

-3.320)03 

.000000 

5  221526 

-  122055 

QUADRUPOLE 

H(N2I ) 

-2.428632 

.000000 

6.737142 

- .070596 

QUADRUPOLE 

H(N2I) 

-1.804488 

.000000 

4.330054 

- .070596 

QUADRUPOLE 

H(N22) 

-3.973713 

.000000 

2.840495 

I.OOtXXM) 

NUCLEI 

HfN22) 

-2.803970 

.000000 

3  407069 

-  033802 

DIPOLE 

H(N22> 

-5.143457 

.000000 

2  273922 

033802 

DIPOLE 

H<N22> 

-3.973713 

l  299733 

2.840495 

- .  1 24659 

QUADRUPOLE 

H(N22> 

-3.973713 

- 1  299733 

2  840495 

- .  1 24659 

QUADRUPOLE 

H(N22) 

-4.559113 

.000000 

4.000932 

-.113747 

QUADRUPOLE 

H(N22) 

-3  388314 

000000 

1 .680059 

-.113747 

QUADRUPOLE 

H(N22) 

-5.134150 

.000000 

2.255096 

-  083287 

QUADRUPOLE 

H(N22) 

-2.813277 

.000000 

3  425894 

-  083287 

QUADRUPOLE 

Source:  From  Ref.  47. 


Electrostatic  Interaction  Energy  in  DNA  Complementary  Bases 

Cumulative  atomic  multipoles  presented  in  this  study  can  be  used  to  evaluate  the 
multipole  component  of  the  electrostatic  interaction  energy.  Such  results  are  pre¬ 
sented  in  Table  IV  for  hydrogen-bonded  adenine-thymine  and  guanine-thymine  base 
pairs  at  the  corresponding  experimentally  determined  geometries  1 38)  and  compared 
with  other  nonempirical  estimates  (31, 39). 
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Table  IV.  Nonempirical  estimates  of  multipole  electrostatic  interaction  energy  in  DNA  complementary 
base  pairs  (in  [kcal/mol]).  Total  interaction  energies  in  parentheses. 


Basis  set 

minimal  |40] 

STO-3G14I] 

MODPOT  1321 

Base  pair 

(7s3p/3s) 

(6s3p/3s) 

(3s3p/3s) 

exp 

Adenine-thymine 

- 13.9<  — 12.9)* 

-8.96" 

- 17. (X  - 13.8)' 

(-13.0)' 

Guanine -cytosine 

-25.4(-23.7)‘ 

-21.53" 

-33.5(-23.2)c 

(-21. Of 

*  Ref.  39,  up  to  quadrupole-quadrupole  term 

"  Ref.  31  (approximate  point  charge  model  fitted  to  reproduce  electrostatic  molecular  potentials). 

'  This  work,  up  to  monopole-quadrupole  term.  Total  interaction  energy  calculated  with  the  use  of 
nonempirical  atom-atom  potentials  described  in  Ref.  20. 
d  Experimental  values  Ref.  42. 


Electrostatic  multipole  energies  predicted  with  modpot  basis  set  are  much  larger 
than  values  obtained  in  minimal  (7, 3/3)  — ►  [2slp/ls]  [401  and  sto-3G  [41]  sets.  This 
is  consistent  with  our  results  obtained  for  12  smaller  hydrogen-bonded  dimers  [33]. 
modpot  interaction  energies  obtained  in  above  mentioned  study  |33]  have  been  very 
close  to  the  reference  6-316*  and  6-316**  values  in  contrast  to  the  severely  under¬ 
estimated  sto-3G  results.  Also  the  modpot  basis  gives  results  very  close  to  the  all¬ 
electron  calculations  using  the  same  valence  basis  set  as  the  modpot  (and  the  same 
inner  shell  basis  set  from  which  the  modpot  parameters  were  determined)  |33.34], 
Our  total  interaction  energies  match  closely  those  predicted  within  more  complex 
nonempirical  potentials  [39]  and  recently  available  experimental  values  [42]. 

Summary 

Molecular  charge  distributions  obtained  in  lcao-mo-scf  calculations  can  be  analyti¬ 
cally  decomposed  into  a  set  of  cumulative  atomic  multipole  moments.  With  the  aid 
of  camm  one  may  estimate  electrostatic  molecular  potentials,  electric  fields,  and 
electrostatic  interaction  energies  with  much  better  accuracy  than  within  one-center 
molecular  expansion  (see  Appendix).  Expansions  up  to  at  least  atomic  quadrupoles 
are  preferable. 

Furthermore,  atomic  multipoles  may  be  analytically  decomposed  into  a  set  of  off- 
nuclear  point  charges  reproducing  simultaneously  all  atomic  and  molecular  moments. 
In  contrast  to  other  available  schemes  of  this  kind  our  approach  allows  systematic 
refinement  by  including  higher  multipole  moments.  Recently  our  approach  has  been 
extended  to  include  correlation  effects  and  the  lowest  molecular  multipole  moments 
match  closely  the  available  experimental  results  for  small  molecules  [43], 

The  results  presented  in  this  contribution  enable  inexpensive  nonempirical  studies 
of  electrostatic  interactions  between  nucleic  acid  components.  The  remaining  interac¬ 
tion  energy  contributions  may  be  estimated  from  existing  nonempirical  potential 
functions  (20|.  A  recent  ab  initio  study  of  Aida  and  Nagata  [44]  indicates  that  the 
electrostatic  energy  has  an  almost  linear  relation  to  the  total  stacking  energy  of 
A-DNA  and  B-DNA  compounds.  This  should  allow  estimation  of  the  relative  stack¬ 
ing  energies  from  atomic  multipole  moments,  in  accordance  with  Langlet  et  al  1 39] . 
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Figure  3.  The  relative  errors  of  various  muitipole  or  estimates  of  the  electrostatic  interac¬ 
tion  energy  calculated  in  the  minimal  all-valence  modpot  basis  set  [32]  as  a  function  of 
intermolecular  distance  R.  (a)  (C02)2  dimer;  (b)  hydrogen-bonded  HOH  .  .  FH  dimer. 
AMT- Atomic  Multipole  Truncated  expansion;  AET- Atomic  Exponent  Truncated  expan¬ 
sion;  MMT-Molecular  Multipole  Truncated  expansion;  MET-Molecular  Exponent  Trun¬ 
cated  expansion. 
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Appendix 

Convergence  of  Atomic  Versus  Molecular  Multipole  Expansion 

The  convergence  of  multipole  expansion  could  be  illustrated  by  the  relative  error 
<*Tpr  -  Xtxta)/XnM  •  100%  of  property  Xtpqr  estimated  from  multipole  series.  As  the 
sample  properties  X,  we  use  here  electrostatic  molecular  potentials  V,  electric  fields  E 
calculated  for  C02  molecule  in  minimal  all  electron  {fss7>p)  basis  set  [32],  and  electro- 
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static  interaction  energies  f^1,'  for  (CO:):  and  HF  •  HOH  dimers  obtained  in  an  so 
decomposition  scheme  (331  *n  minimal  all-valence  modpot  (3j3/>)  — *  (lslp/ls)  basis 
set  (32).  Traditionally,  the  multipole  series  is  truncated  at  the  term  containing  highest 
multipole  moment  m  —  q,  (/.,$,  tl  (moment  truncated,  MT(m)(  and  it  includes  all 
terms  involving  lower  moments.  However,  in  the  case  of  electrostatic  interaction  en¬ 
ergy  evaluated  from  superposition  of  two  multipole  series  the  results  depend  strongly 
on  the  way  the  entire  expansion  is  truncated.  In  contrast  to  multipole  truncated  (mt) 
expansion,  much  better  results  can  be  obtained  (45, 46]  when  it  is  terminated  at  terms 
having  the  same  R  dependence  (exponent  truncated,  ht( it ) |  and  contains  all  terms 
involving  lower  exponents.  In  order  to  compare  the  convergence  of  (A)tomic 
(AMT,  AET)  and  (M)oIecular  (mmt.  met)  expansions  the  corresponding  relative  errors 
have  been  plotted  in  Figures  3-5.  The  exponentially  decreasing  relative  errors 
at  short  distances  (Fig.  3(a)  and  (b)  are  due  to  penetration  effects.  They  can  be 
estimated  from  nonempirical  atom-atom  potentials  |20j.  As  it  may  be  seen  on  Fig¬ 
ure  3(a)  and  (b)  the  remaining  multipole  component  of  electrostatic  interaction 
energy  can  be  reasonably  estimated  from  cumulative  atomic  expansion  even  for  sepa¬ 
rations  typical  for  hydrogen-bonded  dimers. 

It  can  be  seen  that  AET(3)  expansion  including  all  R  '  terms  yields  much  better 
results  than  frequently  used  AET(  I )  scheme  based  on  Mulliken  charges  only 
Besides,  AET(l)  displays  strong  basis  set  dependence  compen¬ 
sated  only  partly  on  higher  order  expansions. 


Figure  4.  Relative  errors  of  various  multipolar  estimates  of  the  molecular  electrostatic 
potential  V  calculated  in  minimal  all-electron  <6i3p/3s|  basis  set  [32]  as  a  function  of  dis¬ 
tance  R  for  C02  molecule. 
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Figure  5.  Relative  errors  of  various  multipolar  estimates  of  electrical  field  values  E  calcu¬ 
lated  in  minimal  all  electron  6.r3/>  basis  set  1 32)  as  a  function  of  distance  R  for  CO; 

molecule. 
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Abstract 

Most  of  the  important  conformations  of  biomolecuies  possess  only  trivial  symmetry.  Consequently,  sym¬ 
metry  groups  have  no  roles  in  the  characterization  of  the  shapes  of  such  conformations.  However,  an  alter¬ 
native  group  theoretical  model,  based  on  homology  groups  of  algebraic  topology,  provides  a  detailed 
description  of  shapes  for  all  conformations.  These  shape  groups  are  useful  for  precise  comparison  of 
molecular  shapes  and  are  proposed  as  a  computational  tool  for  OSAR.  A  new  computational  method  for 
the  determination  of  various  shape  groups  is  described  which  is  suitable  for  the  simultaneous  analysis  of  a 
pair  or  a  family  of  molecular  properties.  In  this  note  a  general  method  is  described  and  applied  to  the  shape 
of  electronic  charge  distribution  along  van  der  Waals  surfaces. 


Introduction 

Molecular  shape  is  one  of  the  most  fundamental  concepts  of  chemistry.  Chemical 
reactivity  and  most  other  chemical,  physical,  and  biological  properties  of  molecules 
are  strongly  dependent  on  molecular  shape. 

It  is  common  to  consider  the  arrangement  of  the  atomic  nuclei:  the  nuclear  ge¬ 
ometry  and  the  associated  network  of  formal  bonds,  as  a  descriptor  of  molecular 
shape.  This  is  indeed  common  practice  in  the  usual  textbook  representation  of 
molecules.  It  is  evident,  however,  that  nuclear  positions  and  lines  of  formal  chemical 
bonds  provide  only  a  “skeleton”  of  the  geometrical  model  of  molecules,  and  that  of 
the  actual  molecular  shape.  The  shape  of  the  “body”  of  the  molecule  requires  addi¬ 
tional  information.  It  is  the  shape  of  the  "body"  of  electronic  charge  distribution,  or 
the  shape  of  electrostatic  potentials,  or  the  shape  of  the  van  der  Waals  surface  of  the 
molecule,  among  other  descriptors,  which  go  beyond  the  simple  skeletal  description 
of  molecular  shape. 

With  the  ever-increasing  demands  of  the  pharmaceutical  industry  for  more  efficient 
methods  of  computer-aided  drug  design  and  molecular  engineering,  there  is  a  new  in¬ 
terest  in  a  realistic,  yet  simple  representation  of  molecular  shape  that  is  suitable  for 
taking  into  account  the  full  three-dimensional  nature  of  the  “body”  of  molecules.  This 
“body”  however,  is  a  quantum-mechanical  entity,  controlled  by  the  properties  of 
three-dimensional  electron  distributions.  Fortunately,  only  some  of  the  properties  of 
electron  density  are  relevant  to  molecular  shape,  and  it  is  possible  to  extract  this  in¬ 
formation  and  represent  it  in  a  remarkably  simple  algebraic  form  using  group  theory, 
in  terms  of  symmetry-independent  shape  groups  of  electron  densities  (1,2].  Of 
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course,  a  much  simpler  and  somewhat  less  revealing  description  is  possible  by  apply¬ 
ing  the  same  group  theoretical  technique  to  van  der  Waals  surfaces. 

The  purpose  of  this  article  is  to  describe  a  new  method  for  the  characterization  of 
molecular  shape  in  terms  of  two  properties:  electron  density  and  van  der  Waals  sur¬ 
faces.  Both  of  these  properties  are  of  great  interest  in  drug  design,  and  the  unified 
treatment  we  present  in  this  study  is  expected  to  be  easily  applicable. 

The  precise  characterization  of  the  shapes  of  molecules  which  possess  no  more 
than  trivial  symmetry  in  their  most  stable  conformations  is  of  fundamental  impor¬ 
tance  in  the  analysis  of  biochemical  problems  on  the  molecular  level.  Recently  a 
symmetry-independent  group-theoretical  method  has  been  proposed  for  such  an  anal¬ 
ysis  [1.21.  The  above  method  is  based  on  a  curvature  analysis  of  various  contour  sur¬ 
faces  of  the  molecule  or  molecular  fragment  of  interest,  which  leads  to  a  concise 
group  theoretical  description  of  the  interrelations  of  surface  domains  of  various  cur¬ 
vature  properties.  The  contour  surfaces  considered  are  the  equipotential  contours  of 
the  electrostatic  potential  generated  by  the  molecule:  the  isodensity  contours  of  elec¬ 
tronic  charge  densities:  contour  surfaces  of  molecular  orbitals;  or  van  der  Waals  sur¬ 
faces  [1,21.  These  surfaces  can  be  calculated  by  routine  quantum  chemical  methods 
or  by  other  techniques  [3-51.  The  resulting  groups  are  the  homology  and 
cohomology  groups  of  truncated  contour  surfaces:  among  them  the  one-dimensional 
homology  groups  H 1  contain  the  most  relevant  chemical  information  [1.2]. 

In  the  present  article  we  shall  describe  an  extension  of  the  above  technique  for  the 
analysis  of  two  (and  by  a  straightforward  generalization,  of  several)  molecular  prop¬ 
erties  relevant  to  molecular  shape.  In  describing  the  method  we  shall  consider  the 
problem  of  electronic  density  variations  along  the  van  der  Waals  surface,  a  problem 
of  obvious  relevance  to  the  study  of  chemical  reactivity,  polarizibility.  solute-solvent 
interactions,  orientation  effects,  drug  receptor  interactions,  and  so  on.  However, 
the  method  is  general  and  is  applicable  for  the  analysis  of  the  interrelations  of  any 
two  three-dimensional  molecular  functions,  for  example,  homo  contours  and  total 
charge  density. 

We  shall  not  repeat  the  actual  derivation  of  homology  groups  of  individual  contour 
surfaces:  the  method  has  been  described  earlier  (l.2|.  and  there  are  excellent  text- 
>  books  available  for  more  mathematical  background  [6-9|.  The  discussion  will  be  re¬ 

stricted  to  the  only  novel  aspect  of  the  new  method:  using  interpenetrating  surfaces 
for  the  development  of  a  new  family  of  symmetry-independent  shape  groups. 

Interpenetrating  Contour  Surfaces 

Consider  a  three-dimensional  cartesian  coordinate  system  attached  to  the  molecule 
or  molecular  fragment  of  interest  and  two  molecular  functions  /,(r)  and  /;(r)  of  the 
three-dimensional  position  variable  r.  The  respective  contour  surfaces  and 

G:(a:)  are  defined  as: 

Gt(a.)  =  { r :  / ,  ( r )  =  </,}  ( 1 ) 


(2) 


and 


G4«:)  =  {r:/;(rl  =  </,} . 
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for  some  suitably  chosen  constants  a,  and  a2.  For  example,  one  may  picture  GAu,  )  as 
the  charge  density  contour  drawn  about  a  molecule  where  the  charge  density  /,(r)  is 
equal  to  the  value  a,,  whereas  G2(a2 )  may  be  taken  as  the  van  der  Waals  contour  sur¬ 
face.  For  the  latter  choice  one  may  define  function  f2(r)  as: 

{I  on  the  van  der  Waals  surface 
2  f  {0  elsewhere,  ^  * 

and  one  may  choose  a2  =  I  in  Eq.  (2).  Alternative  definitions,  continuous  in  r.  may 
also  be  used,  which  definitions  may  express  the  “penetrability"  or  “hardness"  of  the 
molecular  neighborhood,  for  example,  by  taking  a  value  for  /,( r)  less  than  1.0  for 
points  r  outside  of  the  van  der  Waals  surface  and  greater  than  1.0  within.  However, 
for  our  present  purposes  definition  (3)  is  appropriate.  Since  for  this  choice  a2  =  1. 
the  argument  of  C2(a2)  can  be  omitted  and  we  shall  write  simply  G2. 

The  methods  of  symmetry-independent  shape  groups  [1.2]  are  applicable  for  the 
detailed  characterization  of  the  shapes  of  both  the  charge  density  contour  G,(a,)  and 
the  van  der  Waals  contour  G:.  However,  our  present  purpose  is  to  analyze  the  inter¬ 
relations  between  two  molecular  properties.  In  the  actual  example  these  properties 
are  the  shape  of  the  electronic  distribution  and  the  shape  of  the  van  der  Waals  sur¬ 
face.  The  analysis  can  be  accomplished  by  generating  the  homology  groups  ol  the 
object  obtained  by  allowing  these  two  surfaces  to  interpenetrate  one  another,  as 
shown  schematically  in  Figure  1 . 

The  pattern  of  mutual  interpenetration  of  the  two  surfaces  G,  and  G\  can  be  charac- 


terized  by  either  one  of  four  truncated  surfaces: 

G,(./:  a  ad 

(4) 

G-(f2  <  ad 

(5) 

G:(f,  a  a,) 

(6) 

G,(/,  <  a,) 

(7) 

In  the  above  sets  only  those  points  of  the  original  point  set  G,  are  retained  which 
satisfy  the  condition  for  the  other  function  fr  as  stated  in  the  parentheses.  It  is  suffi¬ 
cient  to  choose  only  one  of  the  above  representations  for  a  shape  characterization,  for 
example,  in  Figure  1  the  truncated  surface  G\(/,  a  a,)  is  chosen,  that  is.  the  collec¬ 
tion  of  all  those  points  of  the  van  der  Waals  surface  G2  where  the  electronic  charge 
density  /,(r)  is  greater  than  or  equal  to  the  value  a,. 

It  is  clear  that  an  actual  interpenetration  of  two  general  contour  surfaces  G,(a,)  and 
G2(a2)  may  occur  only  for  restricted  ranges  of  parameters  a,  and  a2;  no  interpenetra¬ 
tion  occurs  if  one  of  these  contour  surfaces  completely  surrounds  the  other.  For  ex¬ 
ample.  in  the  case  of  an  electronic  density  contour  G,(a,)  and  the  van  der  Waals 
surface  G,  no  interpenetration  occurs  if  the  charge  density  contour  value  a ,  is  chosen 
as  either  too  large  or  too  small.  For  a  large  a,  value  the  contour  G,(«,)  lies  much  too 
close  to  the  nuclei,  and  hence  the  entire  G,(a,)  surface  is  contained  within  the  volume 
enveloped  by  the  van  der  Waals  surface  G:.  On  the  other  hand,  for  a  low  value  of 
charge  density  parameter  a,  the  surface  G,(o,l  lies  far  from  the  nuclei,  and  the  entire 
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Figure  1 .  Interpenetrating  contour  surfaces  C|(a,)  and  G;.  Elimination  of  low  charge  den¬ 
sity  domains  from  the  van  der  Waals  surface  G .  along  the  lines  of  intersection  leads  to  trun¬ 
cated  (punctured)  surface  G2(/l  a  a,).  The  resulting  homology  groups  H  '(G;(  /,  a  a,) 
depend  on  the  charge  density  contour  parameter  a,,  and  give  a  concise  description  of  the 
shapes  of  high  and  low  density  domains  along  the  van  der  Waals  surface. 


van  der  Waals  surface  G2  is  “nested”  within  G,(a ,).  In  both  of  such  extreme  cases  the 
two  contour  surfaces  have  no  common  points  and  a  formal  truncation  according  to 
conditions  (4)— (7)  either  leaves  the  original  contour  surface  unchanged,  or  eliminates 
it  altogether. 

In  Figure  1  a  chemically  more  interesting  intermediate  case  is  shown  where  the 
two  contour  surfaces  G,  and  G2  do  have  common  points,  hence  a  truncation  does  lead 
to  topologically  significant  changes.  The  truncated  surface  G2(f,  s  a,)  identifies  the 
domains  of  high  and  low  electron  density  along  the  van  der  Waals  surface  G;.  The 
topological  pattern  of  these  domains  can  be  described  by  the  one-dimensional  homol¬ 
ogy  group  of  the  truncated  surface,  denoted  by 

H'(G2(f,  >  a,)  (8) 

Using  a  construction  identical  to  that  described  in  detail  in  Ref.  1 .  in  the  case  of 
the  example  shown  in  Figure  1  the  homology  group  Hx  obtained  is  isomorphic  to  the 
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Abelian  group  of  three  free  generators  gt,  g2,  g},  and  has  elements  of  the  form: 

k\ g,  +  k2g2  +  kg)  (£,  are  integers)  (9) 

The  mutual  interrelations  among  the  low  and  high  electron  density  domains  along 
the  van  der  Waals  surface  are  described  in  detail  by  the  family  of  one-dimensional  ho¬ 
mology  groups,  as  the  parameter  value  a ,  sweeps  over  a  chemically  significant 
parameter  range  [a,\  a"],  where  a\  and  a"  are  some  low  and  high  charge  density  val¬ 
ues  where  no  interpenetration  of  G,  and  G,  occurs  (corresponding  to  the  two  possible 
nesting  arrangements:  G2  within  G,  and  G,  within  G2,  respectively).  For  the  value 
a |  =  a',  the  surface  G,(a[)  completely  surrounds  the  van  der  Waals  surface  G:  hence 

G2( f  2=  a[)  =  G2  (10) 

and  the  homology  group 

//'(G2(/,  >  a\)  =  H\G2)  (11) 

is  the  trivial  group.  For  the  same  value  a ,  =  a\  the  zero-  and  two-dimensional  ho¬ 
mology  groups  are  isomorphic  to  one  another  and  to  the  additive  group  of  integers: 

H°(G2(f  >  a|)  =  H°(G2)  »  H2(G2(f ,  a  a[)  =  H2(G2)  (12) 

If  a,  is  increased  to  a  higher  value  at  which  a  single  interpenetration  occurs,  leading 
to  the  truncation  of  a  single,  simply  connected  domain  of  low  electron  density  along 
the  van  der  Waals  surface  G2,  then  Eq.  (10)  no  longer  holds,  and  Eq.  (1 1 )  is  replaced 
by  the  isomorphism 

sro,)  stf'(G2)  (13) 

Whereas  for  this  value  a,  the  truncated  surface  has  a  one-dimensional  homology 
group  isomorphic  to  that  for  value  a\,  the  two-dimensional  homology  group  has 
changed  from  one  isomorphic  to  the  additive  group  of  integers  to  a  trivial  group. 

In  general,  a  further  increase  in  the  value  of  a,  leads  to  additional  interpenetrations 
and  truncations,  which  leave  the  zero-  and  two-dimensional  homology  groups  invari¬ 
ant  but  changes  the  one-dimensional  homology  group  H'(G2(f  s  a,))  to  more  com¬ 
plicated  groups  which  are  Abelian  groups  of  several  free  generators.  One  example  is 
shown  in  Figure  1  where  the  resulting  one-dimensional  homology  group  has  three 
free  generators.  By  continuing  the  increase  of  value  a,,  a  larger  part  of  G,(a,)  ap¬ 
pears  within  the  volume  enclosed  by  the  van  der  Waals  surface  G2  and  fewer  but 
larger  domains  of  G2  are  truncated,  leading  to  simpler  one-dimensional  homology 
groups  of  fewer  generators.  For  a  large  enough  charge  density  value  a,,  for  example, 
for  the  value  a ,  =  a\,  the  entire  charge  density  contour  surface  G,(a")  is  enclosed  by 
the  van  der  Waals  surface  G2,  and  truncation  condition  (6)  eliminates  the  entire  sur¬ 
face  G2.  In  this  latter  extreme  case  there  is  no  topological  object  left  to  be  analyzed 
and  no  homology  groups  are  generated. 

The  above  homology  groups,  combined  with  the  shape  groups  of  original  objects 
G,(a,)  and  C2(a2),  described  in  earlier  studies  (1,2).  give  a  complete  characterization 
of  both  the  shapes  and  the  interrelations  of  the  corresponding  molecular  functions,  for 
example,  of  charge  densities  and  van  der  Waals  surfaces. 
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Summary 

A  method  is  described  to  extend  the  application  of  symmetry-independent  shape 
groups  of  molecules  or  molecular  fragments  to  the  characterization  of  the  interplay  of 
two  or  several  molecular  properties,  such  as  electronic  charge  densities  and  van  der 
Waals  surfaces.  In  the  example  used  as  illustration  of  the  method,  the  homology 
groups  give  a  concise  description  of  the  interrelations  between  domains  of  high  and 
low  charge  density  along  van  der  Waals  contour  surfaces  of  molecules.  These  do¬ 
mains  and  their  shapes  are  of  fundamental  importance  in  controlling  intermolecular 
interactions,  such  as  solvent-solute  and  drug-receptor  interactions. 
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Abstract 

In  the  course  of  conformational  motions  of  molecules  the  changes  in  shapes  of  electronic  charge  distri¬ 
butions  follow  that  of  the  nuclear  framework.  However,  this  coupling  between  the  changes  in  the  nuclear 
geometry  and  electron  density  may  depend  on  the  actual  nuclear  displacement;  the  coupling  may  be  weak 
or  strong  for  a  given  conformational  motion.  It  is  of  some  interest  to  analyze  how  faithfully  the  charge 
density  variations  follow  the  nuclear  displacements  in  a  family  of  conformational  rearrangements.  In  cer¬ 
tain  cases  small  conformational  changes  may  induce  large  changes  in  the  shape  of  charge  density  distribu¬ 
tions.  while  in  other  cases  large  and  qualitatively  important  conformational  changes  may  involve 
qualitatively  inessential  distortions  in  the  shape  of  electron  distributions.  In  this  article  we  describe  a  new 
classification  of  conformations  based  on  those  domains  of  nuclear  configuration  space  within  which  the 
“shape  groups”  (symmc..^  independent  homology  groups)  of  the  electric  charge  density  remain  invariant 
Such  an  analysis  might  be  valuable  when  seeking  correlations  between  molecular  structure  and  certain  bio¬ 
logical  or  biochemical  activities. 


Introduction 

The  description  of  dynamical  changes  in  molecules  is  of  utmost  importance  in  sev¬ 
eral  fields.  In  particular,  the  variation  of  certain  physical  properties  under  conforma¬ 
tional  changes  constitutes  a  problem  of  general  interest.  The  extent  of  such  variations 
may  serve,  in  fact,  as  a  criterion  to  classify  the  conformational  rearrangements.  This 
paper  is  essentially  devoted  to  study  this  latter  possibility. 

One  of  the  useful  and  intuitively  simple  properties  to  characterize  a  molecule,  and 
the  processes  it  undergoes,  is  the  molecular  shape  (MS).  It  is  commonly  assumed  that 
at  least  a  restricted  set  of  molecular  properties  can  be  explained  in  terms  of  the  MS. 
Most  biological  and  biochemical  activities  are  among  the  characteristics  correlating 
directly  with  the  MS.  However,  the  correlation  of  these  activities  with  conformational 
rearrangements  is  comparatively  not  so  clear.  The  basic  reason  for  this  difference  is 
that  a  conformational  change  may  not  be  followed  by  a  significant  and  meaningful 
modification  in  shape.  In  other  words,  there  exists  in  general  an  “uncoupling”  be¬ 
tween  the  changes  in  configurational  space  and  the  changes  in  the  three-dimensional 
space  where  the  MS  is  described.  Some  rearrangements  in  the  former  space  will  be 


‘Visiting  scientist.  On  leave  from  INIFTA,  Divisidn  Quimica  Tedrica,  Sucursal  4,  Casilla  de  Coneo  16. 
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’To  whom  any  correspondence  should  be  addressed. 

INTERNATIONAL  JOURNAL  OF  QUANTUM  CHEMISTRY:  QUANTUM  BIOLOGY  SYMPOSIUM  14.  133-147  (1987) 

©  1987  John  Wiley  &  Sons,  Inc.  CCC  0360-8832/87/010133-  15S04. 00 


134 


ARTECA  AND  MEZEY 


more  important  than  others,  because  they  may  lead  to  qualitatively  significant  shape 
modifications.  According  to  these  modifications,  one  might  be  able  to  introduce  a 
sort  of  decompostiion  of  the  configurational  space  into  subsets.  This  decomposition 
would  provide  a  detailed  shape  characterization  of  conformational  changes.  In  this 
paper  we  provide  a  possible  method  to  solve  this  problem. 

There  is  no  unique  way  to  evaluate  and  characterize  the  MS.  Several  alternatives 
have  been  proposed  to  accomplish  this  goal.  First,  isodensity  contours  [see  e.g.. 
Refs.  1-3)  and  electrostatic  isopotential  contour  diagrams  [4-8,  and  references 
quoted  therein]  are  a  common  source  of  data  qualitatively  representative  of  MS.  On 
the  other  hand,  algebraic  and  geometrical  topology  supply  appropriate  tools  for  their 
characterization  [7,8].  This  is  particularly  important  if,  as  most  commonly,  the 
application  of  standard  point  group  theory  results  in  no  insight,  due  to  total  lack  of 
symmetry. 

This  article  shows  how  the  MS ,  defined  with  the  aid  of  algebraic  topology,  is  de¬ 
pendent  on  the  conformational  rearrangement.  The  central  idea  is  very  simple  and 
can  be  summarized  as  follows.  Let  us  suppose  that  a  physically  meaningful  quantity 
is  selected  to  describe  the  MS,  say,  for  instance,  the  electronic  charge  distribution.  In 
this  case,  an  isodensity  contour  defines  a  surface  in  3-space.  In  the  overwhelming 
majority  of  cases,  this  surface  will  be  a  topological  2-sphere  or  a  collection  of  them; 
however,  tori  and  manifolds  with  higher  genus  are  possible.  Following  an  appropriate 
procedure,  that  we  discuss  later,  it  is  possible  to  associate  with  this  contour  a  series 
of  new  surfaces,  no  longer  topologically  spherical.  Each  of  these  new  surfaces  can  be 
completely  characterized  by  the  symmetry-independent  homology  or  cohomology 
groups  (“shape  groups")  [see  e.g..  Refs.  7-10]  Accordingly,  the  main  features  of  the 
MS  can  be  described,  in  general,  by  a  set  of  homology  groups. 

In  the  above  approach,  the  MS  is  associated  with  a  given  conformation.  By  exten¬ 
sion,  a  set  of  homology  groups  is  assigned  to  each  nuclear  configuration.  Based  on 
this  assignment,  we  propose  to  describe  the  coupling  between  the  shape  of  the  charge 
density  distributions  and  the  molecular  rearrangements  in  the  following  way:  the  cou¬ 
pling  will  be  nonessential  in  those  subsets  of  the  configurational  space  where  the 
shape  groups  are  invariant.  According  to  this  model,  only  the  rearrangements  leading 
to  transitions  between  "shape  domains”  would  imply  an  actual  essential  modification 
in  shape-dependent  properties. 

In  order  to  perform  the  above  analysis  the  paper  has  been  organized  as  follows:  in 
the  Methods  section  we  review  the  basic  principles  to  apply  the  method.  Several 
criteria  that  can  be  followed  to  describe  important  features  of  the  MS  undergoing 
conformational  changes  are  discussed.  Using  any  of  these  criteria  the  shape  analysis 
can  be  accomplished,  but  in  general  they  will  lead  to  different,  perhaps  conplemen- 
tary  descriptions.  A  brief  explanation  is  given  about  the  construction  of  the  shape 
groups.  In  “Shape  Regions  in  Configuration  Space,”  the  method  is  illustrated  first  by 
studying  the  conformational  distribution  of  shape  groups  in  the  particular  case  of  a 
triatomic  system.  Some  connectedness  properties  of  the  domains  in  configuration 
space  are  also  stressed.  The  application  to  larger  systems  is  illustrated  by  considering 
a  subset  of  the  configuration  space  composed  by  a  stretching  and  internal  rotation  co¬ 
ordinates  in  a  substituted  benzene  derivative.  Further  comments  and  conclusions  are 
found  in  the  final  section. 
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Method 

Let  us  consider  a  system  of  A'  nuclei,  and  the  corresponding  configuration  space 
WR ,  where  the  position  vector  for  each  nucleus  is  specified  in  terms  of  three  cartesian 
coordinates.  In  order  to  discuss  the  MS  we  will  attach  a  three-dimensional  object  to 
each  point  in  WR .  If  any  of  such  objects  is  to  be  appropriate  to  characterize  the  MS, 
then  it  will  have  to  be  translationally  and  rotationally  invariant.  To  this  end,  we  will 
study  here  the  conformational  changes  in  the  reduced  nuclear  configuration  space 
M  ( 1 1 ,  12] .  Space  M  is  the  quotient  space  of  WR  with  equivalence  relation  of  superpo¬ 
sition  under  rigid  translation  and  rotation.  Furthermore,  it  can  be  proved  that  M ,  as 
well  as  WR,  is  a  metric  space  with  an  appropriate  metric  [11.  12).  We  will  indicate 
with  R  an  element  of  that  space  (REM),  representing  a  nuclear  configuration.  In  turn, 
r  will  stand  for  a  vector  in  3-space  (r  E  3/? ),  representing  a  point  on  the  object  cho¬ 
sen  to  describe  the  shape.  Vector  r  will  not  in  general  represent  any  nuclear  position. 

There  exist  several  alternatives  that  can  be  followed  to  define  a  three-dimensional 
entity,  useful  as  a  model  for  the  MS.  Let  us  consider  a  function /(r,  R).  with  the 
property: 

lim  |/(r, R)|  =  0,  REM.  (I) 

M— « 

Function  /( r,  R)  depends  parametrically  on  the  nuclear  coordinates,  and  it  is  real, 
and  single-valued,  for  each  point  in  3-space.  Some  one-electron,  marginal  properties 
fulfill  these  requisites,  together  with  Eq.  (1);  among  them  the  total  electronic  charge 
density  and  the  effective  electrostatic  potential  are  the  most  straightforward  ex¬ 
amples  [  1  -8] . 

We  will  introduce  now  a  closed  contour  surface  in  7?.  depending  parametrically  on 
R  and  on  a  real  number  a  related  to  the  function  /  as  follows: 

G(a,  R)  =  {r  £  7?:  /( r,  R)  =  a},  REM  (2) 

The  number  a  can  be  regarded  as  a  parameter,  or  it  can  be  fixed  beforehand,  or.  as 
we  will  see  below,  some  physically  significant  values  for  a  can  be  determined  implic¬ 
itly  in  terms  of  some  physical  constraints.  G(a,  R)  can  be  a  single  closed  surface  or  a 
collection  of  a  number  of  disjoint  closed  surfaces. 

Let  G*(o,R)  denote  the  set  of  points  enclosed  by  the  surface  G(a.R),  and  let  us 
define  a  new  set: 

F(a,  R)  =  G  *(a,  R)  U  G(a,  R) ,  (3) 

that  is,  in  fact,  a  level  set  of  one  of  two  types.  Depending  on  the  function  chosen, 
G*(a,  R)  may  contain  points  in  'R  where  /(r,R)  is  larger  than,  or  smaller  than  the 
parameter  a.  Without  any  loss  of  generality,  the  following  treatment  is  applicable  to 
all  functions  /( r,  R)  by  an  appropriate  choice  of  sign;  we  will  consider  from  now  on 
that  /( r,  R)  is  the  charge  density;  consequently,  the  set  (3)  will  contain  points  for 
which  the  density  is  not  smaller  than  a.  Furthermore,  the  origin  of  the  cartesian  sys¬ 
tem  of  space  7?  will  be  considered  to  be  within  F(a.R).  With  these  definitions,  G 
is  a  constant  contour  surface  of  function /(r,  R)  in  }R  and  F  its  corresponding  level 
set  [7,8). 

As  mentioned  above  the  constant  a  can  be  allowed  to  take  values  in  a  given  inter¬ 
val;  in  this  case  we  obtain  a  family  of  isodensity  contours  for  shape  characterization 


136 


ARTECA  AND  MEZEY 


of  the  molecule  in  the  conformation  R  (8).  However,  there  exists  a  series  of  alterna¬ 
tive  choices  for  a  that  might  prove  valuable  in  other  cases.  We  summarize  here  some 
of  the  more  interesting  possibilities: 

First,  let  us  consider  an  arbitrary  contour  G(a‘ ,  R),  with  a'  >  0,  some  constant. 
One  may  set  a  numerical  value  for  the  volume  V(R)  enclosed  by  such  a  surface,  and 
determine  the  value  of  the  parameter  a  in  Eqs.  (2)  and  (3)  implicitly  from  the  chosen 
fixed  value  for  the  volume: 


V(R)  =  dr.  (4) 

As  stated  above,  Eq.  (4)  indicates  the  integration  in  R  restricted  to  the  interior  of  a 
contour  surface.  If  we  now  request  that  VTR)  =  Vn  =  const,  for  all  REW.  then  the 
charge  density  value  a  will  depend  implicitly  on  V„.  and  will  vary  parametrically 
with  R:  a  =  a(V0\ R). 

The  shape  of  a  density  contour  enclosing  a  given  volume  is  a  physically  appealing 
description  of  the  MS.  It  may  be  linked  to  a  van  der  Waals  surface  description  of  a 
molecule.  In  principle,  as  Fia(V„:  R);  R)  will  change  with  the  conformations,  the 
value  a  defining  the  contour  will  also  change.  Consequently,  the  shape  analysis 
obtained  following  this  criterion  will  be  different,  and  complementary  to  the  one 
commented  previously. 

Second,  one  may  define  the  constant  a  not  in  terms  of  a  volume  but  in  terms  of 
regions  within  which  the  number  of  electrons  Nr  is  conserved  for  all  conformations. 
This  can  be  simply  done  by  introducing  the  constraint  on  the  integrated  density: 

/V,.(R)  =  j  /(r.R)dr.  (5) 

■'««.  Rl 

where  the  meaning  of  the  integration  domain  is  the  same  as  in  Eq.  (4).  Requiring 
Af,(R)  =  N"  =  const,  we  obtain  a  =  a(Ne:  R).  The  idea  of  using  subregions  in  7? 
with  fractional  number  of  electrons  has  proved  to  be  useful  in  other  contexts 
[Ref.  13,  and  others  quoted  therein |.  However,  it  has  never  been  exploited  to  provide 
a  complete  shape  characterization  of  the  overall  molecule  undergoing  conformational 
rearrangements,  even  though  it  would  provide  a  conceptually  clear  physical  approach. 
From  the  present  viewpoint,  the  evolution  of  the  MS  with  changes  in  space  M  would 
be  followed  by  focusing  the  attention  to  the  way  in  which  the  isodensity  surface  is 
modified  in  order  to  keep  constant  part  of  the  total  electronic  charge  enclosed. 

It  would  be  expected  that  under  certain  conditions  reactivity  and  other  properties 
could  be  correlated  to  the  steric  or  ’‘volumetric”  effects  in  molecules.  In  other  cases 
the  molecular  interactions  might  depend  on  the  total  charge  enclosed  in  a  given  re¬ 
gion  of  space.  The  contribution  of  some  configurational  reordering  to  the  above  ef¬ 
fects  could  be  estimated  by  carrying  out  the  shape  analysis  with  the  appropriate 
choice  for  constant  a. 

Third,  let  us  consider  now  the  following  situation:  for  a  given  (initial)  configura¬ 
tion  R  the  density  contour  G(a,  R)  is  chosen  for  some  constant  a.  Then,  the  inte¬ 
grated  density  is  evaluated,  leading  to  a  "fraction"  of  the  total  electronic  charge  N,. 
As  the  next  step,  some  underlying  structure  of  the  configuration  space  may  be 
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revealed  upon  finding  those  trajectories  in  space  M  along  which  both  N"  and  a  are 
conserved.  These  trajectories  should  be  highly  significant  for  they  correspond  to  the 
conservation  of  an  integrated  quantity  as  well  as  a  boundary  condition.  Trajectories 
along  which  both  a  and  N°  are  exactly  conserved  may  eventually  not  exist  for  a  given 
initial  configuration  R .  In  this  case  we  simply  look  for  conservation  of  a  and  mini¬ 
mum  change  in  /V,(R)  with  respect  to  N  The  changes  in  the  MS  along  the  above 
paths  are  interesting,  since  they  correspond  to  configuration  changes  with  the  least 
fluctuation  of  charge  enclosed  by  a  chosen  density  contour. 

In  what  follows  we  will  restrict  ourselves  to  the  first  and  simplest  criterion,  for  the 
sake  of  simplicity  and  brevity.  Nonetheless,  it  should  be  kept  in  mind  that  it  is  only 
one  among  many  other  possible  choices. 

To  proceed  with  the  MS  characterization  we  will  essentially  associate  a  regular 
simplicial  complex  (or  a  collection  of  them)  to  G(a,  R)  and  determine  the  homology 
(or  cohomology)  groups  induced  on  it  by  cellular  decomposition.  Details  of  the 
method  have  been  discussed  previously  in  the  literature  for  frozen  nuclear  conforma¬ 
tions  [7. 8J;  accordingly,  we  will  include  here  only  some  basic  steps  adapted  to  our 
needs  in  order  to  include  conformational  changes. 

Let  us  assume  that  the  potential  energy  hypersurface  is  pathwise  connected;  we  can 
define  consequently  a  surjective  mapping  g ;  [0,  1  \~>M ,  so  that  the  g(/ )  =  R  =  R(t ). 
t  £  [0,  1  ] .  nuclear  configurations  collectively  describe  a  path  of  a  possible  conforma¬ 
tional  change.  From  now  on  the  parameter  t  will  measure  the  evolution  of  the  trans¬ 
formation  in  the  metric  space  M .  R(0)  and  R(l)  stand  for  the  initial  and  final 
configurations,  respectively.  The  contour  on  which  the  analysis  will  be  performed 
(Eq.  (2)|  becomes  now  G(a.  R(t)) [and  F(a, R(f))  in  Eq.  (3)). 

In  order  to  transform  G(a,R(f))  into  a  cell  complex  (or  a  collection  of  them)  we 
will  develop  first  a  contour  partitioning  of  the  surface  into  domains  of  specified  cur¬ 
vature  properties  [7.8],  To  this  end,  we  introduce  a  real  parameter  b,  that  will  take 
the  role  of  a  “reference  curvature"  to  determine  subregions  on  G(u.R(t)).  The  use 
and  meaning  of  this  new  parameter  is  as  follows:  consider  first  a  point  on  the  contour 
G(a.  R(t )),  say  r„.  and  the  straight  line  defined  by  the  normal  to  the  surface  at  that 
point.  Further,  consider  a  sphere,  whose  center  r,,'  is  on  the  above  straight  line.  The 
sphere  is  tangent  to  the  surface  at  point  r„.  and  it  has  radius  k  =  ||r„  -  r’||.  We  may 
now  introduce  the  curvature  parameter  b  related  to  the  radius  A  in  the  following  way; 

(1)  b  =  0  if  the  radius  is  infinite  (the  tangent  sphere  is  a  tangent  plane  |7|). 

(2)  b  =  -1/A  <  0  if  the  center  r'  lies  in  the  half-straight  line  extending  from  r„ 
toward  the  interior  of  G(a,R(/)).  (Notice,  however,  that  the  center  may  or 
may  not  lie  within  F(a,R(t)).) 

(3)  *  =  1/A  >  0  if  the  center  r'  lies  in  the  half-straight  line  extending  from  r,,  to 
the  exterior  of  G(a,R(/)). 

As  already  discussed  [8],  one  may  determine  a  series  of  curvature  domains  on 
G(a.  R(f ))  using  the  above  “tangent  spheres”  and  “tangent  planes"  for  different  h  val¬ 
ues.  These  domains  of  curvature  represent  a  generalization  of  concave,  saddle,  and 
convex  D^b)  domains  (D^b)  C  G(a,R(t)))  as  follows  [7,8): 
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a.  r0  £  D0(b)  (“concave”  domain)  if  an  infinitesimal  spheric  sector  d0  of  the  tan¬ 
gent  sphere  with  curvature  b,  centered  about  r0,  fulfills  the  property: 
d0  C  F(a,R(t)l 

b.  r0  E  Dt(b)  (“saddle"  domain)  if  for  the  corresponding  value  of  b  there  exist 
subsets  d'0  C  d0  and  d"0  C  dQ,  so  that:  d'0  C  F(a,  R(/))  and  d"0  (£  F(a,R(t)). 

c.  r 0  E  D2(b)  (“convex”  domain)  if  d0  G*(a,R(/)),  for  the  chosen  value  of  b. 

The  notation  DJb)(/j.  =  p.(b)  =  0, 1,2)  follows  the  convention,  previously  used 
in  the  literature,  of  counting  the  number  of  negative  eigenvalues  of  the  two-dimensional 
Hessian  matrix  on  r0  £  G(a,  R(r)),  for  b  =  0  [7].  If  in  our  case  b  #  0,  p.  will  sim¬ 
ply  count  how  many  of  the  above  eigenvalues  are  smaller  than  the  value  b  [8], 

After  completing  the  analysis  of  curvature  explained  above,  we  have  a  set  {DJb)} 
of  domains  associated  to  G(a,  R(r )).  These  domains  are  not  necessarily  connected. 
As  a  result,  the  previous  contour  partitioning  can  be  used  to  obtain  objects  topologi¬ 
cally  different  from  a  sphere  proceeding  this  way:  for  a  given  value  of  b,  choose  a 
particular  index  value  p  =  i '(b)  (v  =  0,  1 ,  or  2)  and  cut  the  Dv(b)  domains  from 
G(a,R(r)).  After  this,  we  end  up  with  a  collection  of  objects: 

A  =  {[G«„(a,R(t))T\i  =  1,2 - 7}.  (6) 

G„bl  is  disjoint  if  J  >  1.  In  this  case,  [G„M(«,  Rf/))]10  stands  for  one  of  the  pieces 
(maximum  connected  components)  left  from  G(a,R(f)).  It  is  clear  that  if  Dp(b)  C 
[Glrt)(a,R(f))},,),  then  p  *  v. 

One  may  transform  each  subset  (G^(a,  R(f))]1”  into  a  regular  simplicial  cell  com¬ 
plex  [9,  10], 

Kw(p,b,a,t)  =  |[G*,(fl,R(f))rj,  (7) 

upon  cell  decomposition  by  triangulation.  From  now  on  we  shall  abbreviate  the  nota¬ 
tion  as  Ku\t )  =  Ku\v,  b,  a,  t),  understanding  that  t',  b,  and  a  have  been  chosen 
beforehand.  In  this  way  the  attention  is  focused  to  the  conformational  parameter  t 
describing  the  present  conformation  in  M. 

The  methods  of  homology  (or  cohomology)  theory  allow  one  to  characterize  com¬ 
pletely  the  complex  Ku\t)  upon  finding  some  of  its  topological  and  homotopical 
invariants  17-10]. 

Using,  for  instance,  the  approach  of  homology  theory,  the  central  idea  of  the 
method  can  be  summarized  as  follows:  Let  C‘p(t)  be  the  set  of  all  p-dimensional  cells 
of  K  ‘\t ).  In  our  present  case  p  =0,  1,  and  2.  [For  the  basic  concepts  used  here  and 
henceforth  the  reader  is  referred,  for  example,  to  Refs.  7  and  8.]  The  pair  formed  by 
C‘p(t)  and  a  boundary  operator  Ap:  C‘p(t)  — »  C^,(t ),  defined  in  terms  of  an  incidence 
function  between  cells,  induces  a  set  of  chain  complexes  on  K(i){t),  for  different  val¬ 
ues  of  p.  Moreover,  for  each  value  of  p  we  can  build  the  following  two  Abelian  sub¬ 
groups  of  chains: 

Zp(K{i\t))  =  KerV 

BP(K%))  =  Im  A„+l;  Zp(K%))  D  Bp(Ku\t )) . 


(8a) 

(8b) 
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The  group  Zp  represents  the  set  of  each  p-dimensional  chain  with  zero  boundary 
(p-cycles).  Taking  into  consideration  the  operator  property  Ap&p.\  =  0  |9,  10),  we 
know  that  the  boundary  of  all  (p  +  1  )-dimensional  chains  is  a  p-cycle.  Eq.  (8b) 
represents  then  the  set  of  all  bounding  p-cycles.  As  a  result,  the  p-dimensional 
homology  group  Hp(ku\t)),  is  obtained  as  the  quotient  set  (or,  in  the  case  of  additive 
notation,  the  difference  group): 

=  Zp(Ku\t))/Bp(Ku\t)) ,  (9) 

These  groups  are  finitely  generated  Abelian  groups.  According  to  the  fundamental 
theorem  of  the  Abelian  groups  |9,  10]  the  group  (9)  is  isomorphic  to  the  direct  sum  of 
a  number  of  free  cyclic  Abelian  groups  and  a  number  of  finite  cyclic  (torsion) 
groups.  The  rank  of  the  free  cyclic  component  is  the  integer  bp{Ku'(t)),  known  as  the 
p-th  Betti  number  of  the  homology  group.  In  all  the  examples  in  this  paper  no  torsion 
component  is  present.  Consequently  Hr  has  bp  generators.  These  numbers  are  topo¬ 
logical  and  homotopical  invariants.  Following  this  procedure,  we  obtain  a  detailed 
characterization  of  the  shape  of  the  isodensity  contour  G(a.  R(f))  in  terms  of  the  set 
of  Betti  numbers  of  the  cell  complexes  related  to  the  disjoint  pieces  [G„hl(a,  R(/))]"' 
(Eq.  (6)].  This  set  can  be  described  in  terms  of  the  7-tuple: 

ep{t )  =  (bp(K'"(t)),bp(Ka'U)) . bp(K'%))) , 

p  =  0,  1,2;  J  >  1  .  (10) 

In  what  follows  we  wili  restrict  the  discussion  to  the  most  important  case  of  the 
one-dimensional  chains  (i.e.,  we  shall  discuss  only  e,(r)).  As  we  mentioned  above, 
ep(t)  will  be  different  for  a  different  choice  of  the  level-set  parameter  a.  as  well  as  for 
an  alternative  choice  of  the  reference  curvature  b.  Furthermore,  a  truncation  of  the 
domains  DJb),  instead  of  Djb)  iv’  ^  v),  would  lead  to  a  different,  complementary 
description. 

The  previous  analysis  shows  that  a  very  detailed  shape  characterization  is  available 
for  every  transformation  path  R(t )  in  configuration  space.  In  order  to  focus  the  atten¬ 
tion  on  some  particular  effects  one  may  only  consider  a  range  of  curvatures  b  for  a 
given  a,  or  vice  versa.  In  the  next  section  we  illustrate  briefly  some  of  the  possibili¬ 
ties  analyzing  some  simple  conformational  changes. 

It  is  clear  that  along  a  trajectory  R(r)  the  more  interesting  points  are  those  at  which 
some  changes  in  the  7 -tuple  e,(t)  take  place,  either  in  the  values  or  in  the  number  of 
its  components.  When  crossing  those  points  (“transition  points”),  that  we  will  indi¬ 
cate  with  R(r  ,*),  R(t  *),  and  so  forth  (in  order  of  occurrence),  an  infinitesimally  small 
change  in  configuration  space  is  accompanied  by  a  significant  change  in  the  MS. 
This  can  be  expressed  in  a  different  way:  consider  two  subsequent  “transition  points" 
R(r*)  and  R(f* ,);  then,  for  all  R(r)  with  t*  <  t  <  r* ,  the  overall  topological  fea¬ 
tures  of  the  shape  are  invariant  under  the  change  in  space  M.  Our  method  then 
provides  a  framework  to  describe  the  “transition  points"  {R(t*)}  and  their  interrela¬ 
tions.  In  terms  of  all  paths  and  their  transition  points  the  metric  space  M  can  be  parti¬ 
tioned  into  domains  where  the  shape  is  essentially  conserved  (“shape  regions").  As 
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discussed  above,  this  constancy  is  expressed  in  terms  of  the  invariance  of  the  set  of 
Betti  numbers.  In  the  next  section  some  examples  of  different  "shape  regions"  in  M 
are  displayed. 

There  exists  an  alternative  partitioning  of  configuration  space,  which  may  also  pos¬ 
sess  some  interesting  properties.  For  a  given  conformation,  the  distribution  of  homol¬ 
ogy  groups  can  be  studied  as  a  function  of  both  parameters  a  aod  b  |8|.  This  gives  us 
a  detailed  description  of  the  shape  features  of  a  large  family  of  isodensity  surfaces  for 
a  molecule  in  a  frozen  geometry.  The  partitioning  of  the  a-b  parameter  plane  into  do¬ 
mains.  characterized  by  the  set  (10).  reveals  a  particular  topological  structure.  In 
other  words,  there  exists  a  clear  pattern  in  the  number  and  interrelations  of  these 
parameter  domains  of  the  a-b  plane  within  which  homology  groups  are  preserved. 
This  pattern  will  remain  invariant  for  certain  conformational  reorderings.  However, 
there  will  exist  some  particular  configurations  R (/)  at  which  the  topological  structure 
of  the  a-b  parameter  plane  will  change.  These  configurations  will  be  again  very  spe¬ 
cial  points  in  the  space  M  with  regard  to  changes  in  the  shape.  An  independent  new 
partitioning  of  configuration  space  M  can  be  given  in  terms  of  the  topologically  sig¬ 
nificant  changes  in  the  distribution  of  homology  groups  within  the  a-b  parameter 
plane.  This  possibility  will  be  discussed  in  a  subsequent  paper. 

It  is  worth  mentioning  here  that  a  similar  method  of  “shape"  classification  in  terms 
of  homology  (or  cohomology)  groups  can  be  used  to  study  domains  of  higher-dimen¬ 
sional  reaction  globes,  defined  in  terms  of  the  potential  energy  hypersurface.  This  ap¬ 
peared  to  be  valuable  when  analyzing  sets  of  reaction  paths  and  mechanisms  on  the 
surface  ( 14— 16|. 


“Shape  Regions”  in  Configuration  Space 

As  a  first  illustration  of  the  method  discussed  in  the  preceding  section  we  have 
considered  a  triatomic  system  A Bz.  From  the  point  of  view  of  symmetry  group  the¬ 
ory,  the  space  M  for  this  system  can  be  partitioned  into  regions  with  the  following 
symmetries:  C2v,  Cs,  DxA,  Cxv,  and  the  spheric  group  K.  As  we  shall  see.  our  shape 
characterization  based  on  density  contours  allows  a  further,  very  detailed,  subdivision 
of  those  regions. 

The  full  analysis  of  even  a  small  system  as  A8;  becomes  a  nontrivial  task,  owing 
to  the  presence  of  a  large  number  of  parameters.  To  specify  a  configurational  array 
we  need  in  this  case  two  bond  distances  AB(qt  and  q2)  and  the  bond  angle  6.  We 
choose  0  =  0  corresponding  to  the  CXI  symmetry  (linear  A-B-B)  and  9  =  7r  to  the 
linear  array  B-A-B  (Dx/I  symmetry  for  q,  =  q2).  In  addition  to  these  three  parameters 
we  have  the  freedom  in  the  choice  of  the  truncation  index  p.  as  well  as  the  constants 
b  and  a .  We  will  restrict  ourselves  to  the  following  case: 

(1)  The  level-set  constant  a  for  the  density  is  given  a  unique,  representative  value. 
It  is  chosen  to  be  an  intermediate  value  according  to  the  following  criteria: 
( 1 )  a  is  not  large  enough  to  lead  to  a  contour  G  (a .  R(r ))  composed  from  dis¬ 
joint  pieces  when  both  and  q2  are  small;  (2)  It  is  not  small  enough  to  lead  to 
a  contour  G(a,  R(r))  of  a  single  piece  when  both  qt  and  q:  are  targe. 
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(2)  The  reference  curvature  b  is  chosen  in  the  neighborhood  of  the  value  b  =  0. 
That  is.  the  curvature  domains  to  be  discussed  do  not  differ  too  much  from  the 
conventional  concave,  saddle,  and  convex  domains. 

(3)  The  index  for  the  truncated  domains  will  be  taken  as  v(b)  =  2. 

We  have  analyzed  the  most  characteristic  conformational  rearrangements  using  the 
above  constraints  on  a,  b  and  v( b).  The  results  obtained  are  displayed  schematically 
in  Figures  1-4.  The  domains  determined  are  qualitative  in  the  sense  that  their  areas 
are  not  necessarily  proportional;  only  their  interrelations  and  connectedness  properties 
are  of  importance.  In  other  words,  the  topology  of  the  shape  region  partitioning  is  the 
feature  of  importance  and  it  is  properly  represented.  Furthermore,  when  drawing  Fig¬ 
ures  1-4  we  have  assumed  that  the  boundaries  between  shape  regions  are  differen¬ 
tiable  functions  of  the  internal  coordinates. 

To  describe  the  shape  of  the  electronic  isodensity  surface  we  considered  the  gen¬ 
eral  properties  of  these  contours  for  different  nuclear  arrays  (see.  for  instance. 
Ref.  17).  With  respect  to  the  local  concentration  of  charge  we  assume  that  ZA  >  2 ZB. 
That  is,  even  when  q ,  =  q2  and  0  =  0  there  will  be  more  electronic  charge  about  nu¬ 
cleus  A  than  about  B  (in  other  words,  G(a ,  R(/))  will  be  somewhat  "broad"  around  A). 

It  is  clear  that  there  will  exist  certain  conformational  arrays  in  which  G (a.  R(z )) 
will  be  formed  by  one  or  more  closed  surfaces,  where  all  points  on  all  of  them  will 
belong  to  Dfb)  domains.  According  to  our  convention,  these  pieces  must  be  cut 
away;  as  a  result,  nothing  will  be  left  for  analysis.  This  represents  a  no  group  situa¬ 
tion,  that  cannot  be  described  by  Eq.  (10).  To  take  this  case  into  account  we  opt  for 
the  notation  N(i)  (/  =  l,  2,  3. ... ),  meaning  that  no  group  is  presented  because  i 
closed  surfaces  (“pieces”  of  G(a.  R(r)))  have  been  eliminated  because  of  being  D:{b ) 
domains.  In  addition,  the  notation  e|(r)[/V(z')|  will  be  used  when,  even  though  there 
exists  a  series  of  domains  left,  still  i  closed  pieces  have  been  eliminated.  This  con¬ 
vention  allows  us  to  include  in  a  single,  local  diagram  some  information  that  would 
only  be  revealed  by  an  analysis  for  a  range  of  negative  b  values  [8], 

Figure  1  shows  schematically  the  typical  results  for  the  linear  B-A-B  array  (0  =  7r). 
The  values  of  q,  are  restricted  to  the  range  q,  si;  it  is  implied  that  for  q,  >  L  the 
diagram  shows  no  new  features.  After  truncation,  in  this  example  we  Find  no  other 
objects  than  topological  cylinders.  A  simple  analysis  of  Figure  1  reveals  the  follow¬ 
ing  important  relationships  between  conformational  rearrangements  and  changes  in 
the  shape  of  the  isodensity  surface: 

1.  Quasisymmetrical  stretchings:  in  this  case  the  conformational  reordering  is 
given  by  qt  =  qt,  q2  =  <?,  +  8,  with  8  <  I  and  t  s  0.  We  notice  only  two 
“transitions”  (see  Method)  t f  and  t *  corresponding  to  the  changes  /V(l)  — * 
(1,1)  and  (1.1)  — ►  N( 3).  These  transitions  involve  a  change  of  two  generators 
for  the  homology  groups.  This  property  may  be  encountered  in  those  confor¬ 
mational  changes  where  a  nontrivial  symmetry  element  is  conserved  along 
the  path  R(r).  In  rigor,  present  transformation  might  strictly  correspond  to  the 
case  8  =  0  (a  symmetry  plane  perpendicular  to  the  internuclear  axis 
is  preserved). 
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Figure  I .  Schematic  representation  of  isodensity  surface  distribution  of  shape  groups  for 
the  triatomic  linear  system  B-A-B .  (Truncation  type  v(b)  =  2;  see  text  for  details  about  the 
notation  for  domains.  Black  dots  indicate  points  where  four  shape  regions  meet.) 


2.  Asymmetrical  stretchings:  this  transformation  corresponds  to  larger  values  of  8 
defined  above.  The  relationships  between  ‘‘shape  regions”  reveal  the  likelihood 
of  configuration  changes  with  the  transitions:  N(  1 )  -*  ( 1 )  — >  ( 1 ,  1 )  — *  ( 1 ) 
|A(1)]  -►  NO)  and  (1)  ->  N( 2)  —  (1)  [N(D]  -»•  NO),  but  the  low  probabil¬ 
ity  of  the  change  in  density  shape  described  by:  #(!)—►  (1)  — *  N{2)  — *  (1) 
[A(l)l  — *  N(3).  These  “shape  transitions”  involve  in  all  cases  a  change  in  a 
single  generator  after  crossing  a  transition  point. 

Some  other  important  processes  that  can  be  seen  in  this  figure  are  the  “compres¬ 
sions”  from  the  symmetrical  case  (q{  —  const,  for  all  t,  and  q2  =  qtt)  and  “dilatations" 
(<jr2  =  const,  for  all  f,  and  <?,  =  q2t).  As  it  is  displayed  in  the  figure,  the  boundaries 
of  some  “shape  regions”  are  parallel  to  paths  corresponding  to  such  transformations. 
Consequently,  some  transitions  between  shape  regions  are  not  allowed,  or  are  very 
unlikely,  along  some  of  these  conformational  rearrangements.  The  unlikelihood  of 
certain  processes  is  a  consequence  of  the  fact  that  some  boundaries  between  shape 
regions  are  curves  turning  rapidly  into  horizontal  or  vertical  straight  lines. 

In  addition,  observe  that,  as  a  consequence  of  the  assumption  of  differentiability  of 
these  boundaries  and  the  symmetry  of  the  shape  region  diagram  with  respect  to  the 
straight  line  q ,  =  q},  the  curves  become  perpendicular  to  this  line. 

Figures  2  and  3  show  the  analogous  information  for  d  =  0  and  0  <  0,  <  77.  In  the 
latter  case,  we  consider  the  angle  0,  to  be  small.  We  notice  that  the  structure  of  the 
partitioning  into  shape  regions  is  richer  in  these  cases.  Furthermore,  a  new  topologi¬ 
cal  object  is  present  in  the  latter  example:  e,(r)  =  (2),  corresponding  to  a  cylinder 
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Figure  2.  Schematic  representation  of  isodensity  surface  distribution  of  shape  groups  for 
the  triatomic  linear  system  A-B-B.  (Truncation  type  v(b)  =  2;  see  text  for  details.  Black 
dots  indicate  points  where  four  shape  regions  meet.) 


Figure  3.  Schematic  representation  of  isodensity  surface  distribution  of  shape  groups  for 
the  triatomic  angular  system  AB2  for  some  intermediate  angle  6, .  (Truncation  type  v{b)  =  2; 
see  text  for  details.  Black  dots  indicate  points  where  four  shape  regions  meet.) 
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with  an  “additional  hole.”  The  analysis  of  the  important  shape  transitions  for  different 
conformational  changes  can  be  accomplished  as  shown  above.  It  is  worth  mentioning 
that  the  structure  of  domains  in  Figures  2  and  3  being  much  detailed,  these  cases 
present  several  constraints  to  possible  and  unlikely  shape  transitions.  The  interrela¬ 
tions  among  the  boundaries  of  regions  present  a  pattern  of  regular  features  whose  use 
might  be  valuable  to  analyze  more  complicated  systems.  Moreover,  some  other  prop¬ 
erties  may  be  useful,  for  instance,  the  fact  that  the  patterns  in  Figures  1-3  should  be 
homotopically  interconvertible  into  one  another. 

Figure  4  completes  the  description  of  the  AB2  system  by  analyzing  the  domains  for 
</,  =  q2  =  qt,  and  all  the  range  of  values  0  s  Q  <  n. 

The  study  of  larger  molecules  can  be  performed  in  a  comparable  manner.  Although 
the  space  M  will  be  in  general  higher  dimensional,  one  may  focus  the  attention  only 
to  the  changes  in  the  isodensity  contours  produced  by  some  particular  conformational 
rearrangements  of  interest.  In  Figure  5  we  study  schematically  one  typical  example. 
We  have  considered  a  trisubstituted  benzene  derivative;  two  substituents  are  generic 
atoms  X  and  the  third  is  the  generic  group  CZ,  (see  Fig.  5).  Here  we  restrict  to  the 
stretching  between  the  ring  and  the  group  CZ,.  and  the  internal  rotation  of  the  same 
group,  measured  by  the  dihedral  angle  a.  The  conventions  for  the  choice  of  a.  b.  and 
v(b)  follow  those  used  in  Figures  1-4. 

We  choose  here  the  contour  constant  a  under  the  assumption  that  the  surface 
G(a.  R (0)  around  the  benzene  ring  and  the  hydrogen  atoms  show  no  unexpected  cur¬ 
vature  features  of  importance.  Moreover,  we  consider  that  all  noticeable  changes 
occur  between  the  above  regions  and  about  the  substituents.  In  particular,  the  group 


Figure  4.  Schematic  representation  of  isodensity  surface  distribution  of  shape  groups  for 
the  triatomic  system  AB:.  The  bending  vibrations  coupled  to  symmetric  stretchings  arc  dis¬ 
played.  (Truncation  type  v(b)  =  2;  see  text  for  details.  Black  dots  indicate  points  where 

four  shape  regions  meet  I 
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Figure  5.  Schematic  representation  of  isodensity  surface  distribution  of  shape  groups  for 
the  trisubstituted  benzene  ring  (ChH,)X;(CZ,)  The  configurational  rearrangements  are  re¬ 
stricted  to  a  stretching  (coordinate  q)  and  an  internal  rotation  (dihedral  angle  a).  (Truncation 
type  v(b)  =  2;  see  text  for  details.  Black  dots  indicate  points  where  four  shape  regions  meet.) 


CZ,  is  considered  to  originate  a  single  D2  domain.  Observe  that,  in  addition,  the 
range  of  q  is  bounded  to  an  appropriate  interval  [L,,L2], 

Evidently,  the  periodic  nature  of  angular  rotation  from  a  =  n/2  to  a  —  -jt/2  is 
clearly  shown  in  the  diagram  of  “shape  regions,”  It  is  worth  noting  that  the  transition 
(3)  — *  (5)  is  critical  in  the  sense  that  it  could  only  happen  at  a  particular  pair  of  a  and 
q  values.  This  transition  involves  a  change  in  two  generators;  notice  that,  as  com¬ 
mented  above  for  the  AB2  system,  it  is  associated  to  a  conformational  rearrangement 
where  a  symmetry  plane  is  conserved. 

Further  Comments 

The  examples  given  in  the  preceding  section  illustrate  the  possibilities  of  the 
method  to  characterize  the  shape  of  molecular  conformations.  In  actual  cases  of  inter- 
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est,  after  an  appropriate  computation  of  the  chosen  function  /(r,R)  [1-6],  the  condi¬ 
tions  for  the  selection  of  parameters  a,  b,  and  v  can  be  exploited  to  provide  a  very 
detailed  description  of  the  coupling  between  MS  and  the  nuclear  geometry. 

The  above  possibilities  in  the  implementation  of  the  method  suggest  it  for  the 
analysis  of  the  relationships  between  biochemical  activity  and  the  actual  changes  in 
the  molecular  conformations.  The  procedure  can  be  used  to  study  either  shape  char¬ 
acteristics,  disregarding  the  molecular  size  effects,  or  the  characteristics  depending 
simultaneously  on  size  and  shape.  To  this  end,  one  may  choose  the  criteria  to  define 
the  level  set  F(a,R(/))  from  those  conditions  suggested  in  the  Method  section.  Fur¬ 
thermore,  the  method  can  be  easily  extended  to  study  other  surfaces  by  using  a  more 
detailed  partitioning  than  the  one  based  on  curvature.  For  instance,  a  molecular  van 
der  Waals  surface  with  a  superimposed  map  of  ranges  of  values  of  electrostatic  poten¬ 
tial  can  be  treated  without  introducing  significant  changes  in  the  formalism. 

It  is  worth  reiterating  here  that  above  analysis  is  symmetry  independent.  Conse¬ 
quently,  many  features  of  the  relations  among  “shape  regions”  in  the  metric  space  M 
should  be  expected  to  remain  valid,  even  after  symmetry  changes.  For  instance,  if 
one  replaces  the  case  A82  by  a  triatomic  system  ABC,  the  symmetry-group  distribu¬ 
tion  in  M  will  be  totally  different.  However,  many  characteristics  of  the  shape  regions 
would  be  unaltered.  This  suggests  that  the  homology  group  analysis  would  allow 
using,  indirectly,  some  symmetry  information  to  describe  asymmetrical  objects.  For 
example,  the  link  can  be  established  by  studying  systems  whose  point  group  contains 
the  group  of  interest  as  a  subgroup. 
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Abstract 

This  work  is  a  continuation  of  our  previous  experimental  and  theoretical  investigations  in  the  field  of 
aminothiol  radioprotectors  and  anticancer  drugs.  The  best  known  aminothiols  are  cysteamine  and  the  natu¬ 
ral  intracellular  radioprotector  glutathione  (GSH).  In  this  study,  we  present  a  tentative  discussion  of  simi¬ 
larity  in  a  class  of  aminothiols  such  as:  cysteamine.  methylated  cysteamine.  cysteine.  AET.  WR-1065, 
WR-2578  S,  I- 102  S.  I- 143 .  penicillamine,  and  GSH.  by  using  the  Randic  graph  topological  method.  We 
have  used  the  Randic  graph  approach  by  introducing  weighting  factors  for  hetero  bonds  (C  —  O  =  x. 
C  —  N  =  y.  and  C  —  S  =  r)  involved  in  the  studied  aminothiols.  In  which  case,  the  paths  become  poly¬ 
nomials  in  variables  x ,  v .  and  r .  The  similarity  is  discussed  versus  (x ,  y , : )  values,  using  similarity  ma¬ 
trices,  introduced  as  a  set  of  euclidean  distances  between  a  pair  of  vectors  in  the  n -dimensional  vector 
space  of  paths  or  in  the  n -dimensional  vector  space  of  atomic  indexes. 


Introduction 

In  a  recent  paper,  D.  H.  Rouvray  introduces  the  art  to  “predicting  chemistry  from 
topology”  by  saying:  “At  the  heart  of  the  new  technique  is  the  topology  of  individual 
molecules:  the  pattern  of  interconnections  among  each  molecule’s  atoms,  which 
determines  the  ultimate  architecture  of  the  molecule"  [  1  ] . 

Molecular  topology  is  indicated  when  searching  descriptors  (atomic  or  molecular) 
which  are  able  to  give  an  eventual  correlation  between  them  and  a  particular  property 
(physical,  chemical,  biological,  or  pharmaceutical)  or  which  are  able  to  approach  the 
similarity  concept  in  a  class  of  molecules.  This  way,  we  have  attempted  to  broach  the 
notion  of  similarity  in  the  aminothiol  molecules.  Sulfur-containing  molecules,  and 
particularly  aminothiols,  constitute  a  very  interesting  class  because  of  their  radiopro¬ 
tector,  and  in  some  cases,  their  anticancer  properties  [2-7], 

Aminothiol  radioprotector  agents  are  able  to  reduce  the  radiation  damage  when 
administered  to  animals  or  cellular  culture  before  irradiation  with  ionizing  radiation. 
They  are  characterized  by  their  dose  reduction  factor:  DRF  =  ratio  of  lethal  dose  of 
irradiation  for  50%  of  animals  treated  with  radioprotector  to  lethal  dose  of  irradiation 
for  50%  of  control  animals. 

Aminothiol  anticancer  drugs  can  protect  normal  tissues  and  tumors,  in  a  differen¬ 
tial  manner,  in  chemotherapy  and  radiotherapy. 

Figures  1  and  2  represent  10  aminothiols  which  will  be  studied  from  the  topologi¬ 
cal  point  of  view.  In  this  class,  the  naturally  occurring  molecules  are:  the  aminoacid 
cysteine  (DRF  =  1 .4)  and  glutathione  (GSH)  which  is  the  tripeptide  thiol  y-Glu-Cys- 
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Figure  1 .  Numbering  of  the  heavy  atoms  for  aminothiols  cysteamine  (/3-mercaptoe- 
thylamine),  AET  (aminoethylisothiouronium),  methyl-2-cysteamine.  N-trimethyl- 
cysteamine  (Me  =  Methyl),  WR-2578S  (free  sulfhydryl  form  of  WR-2578)  and  WR-1065 
(free  sulfhydryl  form  of  WR-2721). 


Gly.  It  is  to  be  noted  that  GSH  is  the  most  abundant  thiol  inside  cells  and  possesses 
multifunctional  activities  in  biology,  cancer  therapy,  and  pharmacology  [8].  GSH  is 
also  an  endogenous  intracellular  radioprotector  [2], 

The  best  known  synthetic  aminothiol  is  cysteamine  which  is  directly  derived  from 
cysteine.  Cysteamine  is  a  good  radioprotector  in  vivo  (DRF  =  1 .6),  but  introduces  an 
unacceptable  toxicity. 
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Figure  2.  Numbering  of  heavy  atoms  for  aminothiols  containing  a  peptide  or  a  pseudopep¬ 
tide  groups:  l- 102  S  (free  sulfhydryl  form  of  1-102).  1-143.  glutathione  or  GSH  ty-Glu-Cys- 
Gly)  and  the  aminoacid  cysteine. 


Other  radioprotective  drugs  synthesized  on  the  basis  of  cysteamine  formula  are. 
methyl-2-cysteamine  (DRF  =  1.8),  AMrimethyl-cysteamine.  AET  (DRF  =  1.6), 
WR-1065,  which  is  the  free  sulfhydryl  form  of  WR-2721  (DRF  =  2.7),  and  WR- 
2578-S  (DRF  =  2.0)  (free  sulfhydryl  form  of  WR-2578).  WR-2721  is  the  most  effec¬ 
tive  known  radioprotector;  it  is  also  an  anticancer  drug. 

Recently,  two  molecules  derived  from  the  y-Glu-Cys  branch  of  GSH  were  syn¬ 
thesized  by  Imbach  and  co-workers  [9.  I0J.  These  new  radioprotectors  are:  1  102 
(DRF  =  1.4)  and  1-143  (1-102  S  is  the  free  sulfhydryl  form  of  1-102).  Generally,  the 
active  metabolite  of  these  drugs,  is  in  all  cases  the  free  sulfhydryl  form,  which  is  able 
to  interact  with  DNA,  and  for  this  reason  we  have  included  it  in  our  study. 
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We  have  demonstrated  previously  by  using  spectrophotometric,  dielectric,  and  nu¬ 
clear  magnetic  resonance  (NMR)  techniques  that,  in  vitro,  cysteamine,  WR-1065, 
and  WR-2578  S  interact  with  DNA.  This  interaction  is  essentially  electrostatic  and 
involves  the  cationic  sites  of  radioprotectors  and  the  anionic  DNA  phosphate  sites 
[11-15].  With  the  help  of  quantum  chemical  computations,  we  have  also  demon¬ 
strated  the  possible  mechanism  of  DNA-cysteamine  interaction  at  the  molecular  level 
[  16]  and  determined  the  electrostatic  properties  (charge  distribution  and  electrostatic 
potential)  of  cysteamine,  methylated  cysteamine,  WR-1065,  WR-2578  S,  and  1-102  S 
[17,18]. 

Returning  to  the  Figures  1  and  2,  it  is  obvious  that  similarities  exist  between  these 
molecules  Quantitative  discussion  of  molecular  similarity,  in  a  class  of  molecules 
possessing  similar  properties,  may  be  conducted  using  the  graph  descriptors  and  in¬ 
dexes  introduced  by  M.  Randic  [19]. 


Procedure 

The  Randic  topological  descriptors  are  well  described  in  recent  papers  [20— 23], 
Two  kinds  of  topological  indexes  may  be  used  to  qualify  each  molecule  by  a  graph 
invariant. 

First,  the  building  of  the  atomic  path  sequences:  Pn,P , . Pt, ...  P„:  in  which 

P0  is  the  number  of  heavy  atoms  (all  hydrogen  atoms  are  excluded).  P,  is  the  number 
of  paths  of  length  1  (1  bond),  Pk  is  the  number  of  paths  of  length  k  (k  bonds)  [see 
Tables  I  and  II]. 

For  each  atom  (/),  the  atomic  index  (atomic  ID  or  a, )  is  obtained  by  summing  all 
path  numbers  on  each  row  (/). 

For  the  molecule  we  find  an  invariant  signature  (total  number  of  paths),  at  the  last 
row  of  the  molecular  graph,  by  adding  the  atomic  path  contributions  on  each  column. 
Thus  the  molecule  path  in  the  last  row  is: 

*Po\  1PJ2 

k*  0 

(because  each  path  involves  2  atoms) 


Finally,  the  sum  of  molecular  paths  gives  the  molecular  index  (molecular  ID). 

To  illustrate  this  path  count,  we  have  represented  in  Tables  I  and  11  the  obtained 
results  for  WR-1065  and  1-102  S.  These  molecules  possess  the  same  molecular  index 
36.  when  all  bonds  are  weighted  by  the  factor  1 .  In  this  topological  description  WR- 
1065  is  represented  by  the  signature:  8,  7,  6,  5,  4,  3,  2,  1  and  1-102  by  the  signature: 
8, 7, 7,6, 4, 3,  1. 

If  we  wish  to  take  into  account  the  contributions  of  heteroatoms  such  as  O.  N,  and 
S,  it  is  necessary  to  associate  with  each  heterobond  a  weight  different  from  one.  We 
have  named  the  heterobonds:  C — O  =  x\  C — N  =  v,  and  C — S  =  r.  In  this  case, 
the  count  of  path  introduces  polynomial  expressions  of  variables  ,v,  y.  and  r.  Table  III 
shows  the  obtained  molecular  graph  for  I- 102  S . 
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Table  I  The  count  of  paths  in  WR-1065  with  bond  weights:  C — C  =  1;  C  —  S  =  1; 

C  — N  =  I 
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Table  II 

The  count  of  paths  in  1-102  S  with  bond  weights:  C  —  C  = 
C  — S  =  1;  C  — O  =  1 

1;  C  — N  =  1: 
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Molecular 

ID  =  36 

Second,  when  the  sizes  of  molecules  are  very  different,  like  GSH  and  cysteamine 
for  example,  it  is  convenient  to  introduce  a  new  atomic  connectivity  index  which 
reduces  the  contributions  of  groups  far  from  a  studied  similar  main  chain  in  a  lot 
of  molecules. 

This  index  is  defined  by  Randic  as  a  weight  (mn)'1  ;  for  each  bond  with  m  and  n 
neighbors  for  the  terminal  atoms  of  the  bond.  Then,  a  new  count  of  path  is  built  using 
this  bond  factor. 

It  is  possible  also  to  introduce  a  supplementary  variation  in  the  path  count,  by 
adopting  different  weights  for  the  heterobonds  C  —  O,  C  —  N.  and  C  —  S. 

Table  IV  shows  the  obtained  result  for  the  molecule  1-102  S.  when  only  the  atomic 
indexes  of  the  six  atoms  S, — C, — C, — N4  —  C, — C6  of  the  main  chain  are  taken 
into  account. 


Table  IV.  Atomic  indexes  for  the  6  atoms  in  the  mam  chain  S, -Cj-Cj-N.-Cj-C.  of  I- 102  S,  computed  with  the  (ml  12  connectivity  index  and  with 

bond  weights:  C— C  =  I;  C — O  =  C —  N  —  v  and  C  S  —  z 
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a2 


Figure  3.  Euclidean  distance  daB  between  two  molecules  a  and  fS  represented  by  the  vec¬ 
tors  V„  and  Vfi.  (1)  In  the  n-dimensional  vector  space  of  paths  |  p,]p,a  is  the  component  of 
V„  corresponding  to  a  path  of  length  i  ;  p,a  is  the  component  of  Vs  corresponding  to  a  path  of 
length  j. 

dae  =  -  V„|:}'2  =  jSlp,0  -  p,A 

'.■0 

Observe  than  if  m  <  n.  Vfl  belongs  to  a  subspace.  (2)  In  the  n-dimensional  vector  space  of 
atomic  indexes  (a,  ]:<*,„  is  the  component  of  V„  corresponding  to  the  atomic  index  of  the 
atom  (i)  and  a/B  is  the  atomic  index  of  atom  (/)  in  the  molecule  /3.  In  this  vector  space  we 
consider  the  same  number  n  of  atoms  in  molecules  a  and  IS.  Euclidean  distance  is: 
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Results  and  Discussion 

The  quantitative  concept  of  similarity  between  two  molecules  «  and  /3,  inside  a  lot 
of  molecules,  may  be  expressed  —  as  indicated  by  Randic  —  with  the  help  of  similar¬ 
ity  matrices.  Each  element  of  a  similarity  matrix  represents  the  Euclidean  distance 
between  two  molecules  a  and  /3  in  a  n -dimensional  vector  space  (see  Fig.  3 1.  Conse¬ 
quently,  two  representations  are  possible:  ( 1 )  Each  molecule  is  represented  b\  a  vec¬ 
tor  V((  in  the  /(-dimensional  vector  space  of  paths  |P, ).  (2)  Each  molecule  «  (or  a 
fragment  of  molecule)  is  represented  by  a  vector  V„.  in  the  n -dimensional  vector 
space  of  atomic  indexes  [a,]. 

Similarity  Matrices  in  the  8-Dimensional  Vector  Space  of  Paths 

In  a  first  step,  we  have  chosen  to  discuss  the  global  approach  of  similarity  between 
10  aminothiols  of  different  sizes.  The  9  radioprotectors:  cysteamine  (CYSA).  Me-2- 
cysteamine  (ME2C),  ?V-3Me-cy.steamine  (3MEC).  AET.  cysteine  (CYSE).  WR-1065 
(1065),  WR-2578  S  (2578),  1-102  S  (1102).  1-143.  and  a  tenth  molecule  which  is  not 
a  radioprotector:  the  penicillamine  (PEN1).  The  shortened  names  of  molecules  which 
will  be  used  in  the  following  tables  are  in  parentheses.  Penicillamine  (/3/3  methyl  cys¬ 
teine.  see  Fig.  4)  is  obtained  by  degradation  of  penicillin.  It  is  a  drug  of  great  biolog¬ 
ical  interest  and  presents  different  properties  with  regard  to  its  d  and  l  chiral  forms. 
D-penicillamine  is  well  known  as  a  chelating  copper  agent  and  also  to  decrease  the 
synthesis  of  collagen;  i  -penicillamine  is  a  very  toxic  drug  |2|. 


R4  n4  C3  c2  St  R1 

k 

_  J* _ 

^6  ^6  ^5  ^4  C3  C2  S, 

R3 

Figure  4.  (A)  Five  molecules  based  on  the  cysteamine  main  chain.  AET:R,  =  — CN.; 

Methyl-2  cysteamine :  R,  =  — C;  N-trimethylcysteamine : R3  =  — Ct;  cysteine:R,  = 

—  CO.;  and  penicillamine : R;  =  — C,  and  R,  =  — CO,.  (B)  Five  molecules  based  on 
the  WR-2578  S  main  chain  :  WR-2578  S:R„  =  — N:  WR-1065:Rh  =  — CN; 
1-102  S:R,  =  — O  and  R*  =  — N;  I-I4.VR,  =  — O  and  R„  =  — CS;  OSH  R,  = 

—  (CO)  —  NC  —  CO-.  R,  =  — O  and  R*  =  — C  — CN  — CO; 
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We  have  excluded  glutathione  from  this  computation  because  it  contains  too  great 
a  number  of  paths.  In  this  space  of  paths  [/*,•],  WR-1065  and  1-143  possess  the  longest 
path  P7;  the  other  molecules  are  represented  in  subspaces  Pj,  P4,  P5,  and  P6. 

Table  V  shows  the  obtained  results,  in  the  space  of  paths,  for  the  10  molecules 
studied.  Each  molecular  path  is  the  last  row  of  the  count  of  paths  for  each  amino- 
thiol,  with  heterobond  weights. 

From  Table  V  it  is  now  possible  to  build  similarity  matrices,  in  the  8-dimensional 
vector  space  of  paths,  with  empirical  values  for  the  set  (x,  y,  z).  Starting  from  the  set 
(0.4,  0.4,  0.4)  and  arriving  to  the  set  (1.2,  1.2,  1.2)  we  have  varied  the  ( x ,  y,  z)  val¬ 
ues  by  using  a  0.2  step.  So,  we  have  obtained  125  similarity  matrices  with  various 
possibilities  for  the  heterobond  weights.  As  an  example,  we  show  in  Table  VI  the  ob¬ 
tained  similarity  matrix  for  the  set  (0.4,  0.8,  0.6). 

Then,  we  examined  the  10  Euclidean  distances  da$  according  to  ascending  order, 
starting  from  the  minimal  one,  in  each  of  the  125  similarity  matrices.  Table  VII 
shows  a  sample  of  obtained  results  for  4  different  set  values  (x,  y,  z). 

The  principal  results  of  this  analysis  are: 

1.  The  remarkable  similarity  of  the  molecule  pair  WR-1065  and  1-102  S.  This  pair 
is  the  only  one  that  appears  125  times  in  the  best  6  d^Ab  times  first,  40  times  sec¬ 
ond,  18  times  third,  14  times  fourth,  6  times  fifth,  and  I  time  sixth. 

2.  The  WR-1065  and  WR-2578  S  couple  (which  are  the  most  potent  radioprotec¬ 
tors)  appears  106  times  in  the  best  10  dafj,  but  never  in  first  or  second  rank:  6  times 
third,  4  times  fourth,  24  times  fifth,  21  times  sixth,  24  times  seventh,  13  times 
eighth,  9  times  ninth,  and  5  times  tenth. 

3.  The  WR-2578  S-AET  couple  appears  96  times  as:  26  times  first,  13  times  sec¬ 
ond,  19  times  third,  7  times  fourth,  8  times  fifth,  9  times  sixth,  1  time  seventh,  8 
times  eighth,  3  times  ninth,  and  2  times  tenth. 

4.  The  WR-2578  S-I-102  pair  placed  98  times  in  the  best  10  da(3,  with  only  2 

r  times  in  first  rank. 

5.  The  WR-2578  S-N-trimethyl  cysteamine  couple  appears  82  times  with  10 
times  in  first  rank. 

6.  The  cysteamine-Me-2-cysteamine  pair  appears  81  times  with  3  times  in  first 
rank. 

7.  The  WR- 1065-1- 143  couple  appears  67  times  with  8  times  in  first  rank  for  the 
following  (x,  y,  z)  values:  (0.4,  0.8,  0.4);  (0.4,  1,  0.4);  (0.4,  1,  0.6);  (0.4,  1,  0.8); 
(0.4,  1.2,  0.6);  (0.4,  1.2,  0.8);  (0.4,  1.2,  1)  and  (0.6,  1.2,  0.8). 

This  last  result  shows  that  the  similarity  between  1-143  and  WR-1065  is  reinforced 
when  the  aminothiol  part  is  more  weighted  than  the  carboxyl  one  (y  +  z)/2  >  x,  in 
all  cases). 

8.  The  1-102  S-I-143  couple  appears  59  times  with  15  times  in  the  first  rank  for 
the  set  values:  (0.4,  1.2,  0.4);  (0.6,  1.2,  0.4);  (0.6,  1.2,  0.6);  (0.8,  1,  0.4);  (0.8, 
1.2,  0.4);  (0.8,  1.2,  0.6);  (0.8,  1.2,  0.8);  (1,  1,  0.4);  (1,  1.2,  0.4);  (1,  1.2,  0.6);  (1, 
1.2,  0.8);  (1.2,  1,  0.4);  (1.2,  1,  0.6);  (1.2,  1.2,  0.4);  and  (1.2,  1.2,  0.6). 
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Tabu;  VI.  Similarity  matrix  in  the  8-dimensional  vector  space  of  paths  with  U.v.r)  =  (0.4. 0.8. 0.6). 


Molecule 

1 

CYSA 

2 

ME2C 

3 

CYSE 

4 

3MEC 

5 

AET 

6 

1065 

7 

2578 

8 

1102 

9 

1143 

10 

PENI 

1 

CYSA 

.000 

2.280 

4.924 

5.875 

4.418 

7.504 

5.265 

6.663 

8.646 

11.136 

1 

2 

ME2C 

.000 

.000 

2.785 

3.682 

2.559 

5.598 

3.388 

4.705 

6.707 

8.962 

2 

3 

CYSE 

.000 

.000 

.000 

1.412 

1.542 

3.429 

1.714 

2.368 

4.293 

6.416 

3 

4 

3MEC 

.000 

.000 

.000 

.000 

2.200 

2.882 

1.909 

2.018 

3.642 

5.513 

4 

5 

AET 

.000 

.000 

.000 

.000 

.000 

3.536 

1.394 

2.525 

4.535 

7.442 

5 

6 

1065 

.000 

.000 

.000 

.000 

.000 

.000 

2.316 

rri35i 

1  286 

5.080 

6 

7 

2578 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

1.449 

3.418 

6.538 

7 

8 

1102 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

2.057 

5.367 

8 

9 

1143 

.000 

.000 

.000 

.000 

.000 

000 

.000 

.000 

.000 

4.352 

9 

10 

PENI 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

.000 

10 

Table  VII. 

The  ten  best  euclidean  distances  in  four  similarity  matrices  with  different  (r.  y. 

r)  values. 

U.  V, 

r)  =  (0.40,0.80.0.60) 

daii 

1.24 

1.29 

t  .39 

1.41 

1.45 

1.54 

1.71 

1.91 

2.02 

2.06 

a 

1065 

1065 

AET 

CYSE 

2578 

CYSE 

CYSE 

3MEC 

3MEC 

1102 

p 

1102 

1143 

2578 

3MEC 

1102 

AET 

2578 

2578 

1102 

1143 

(t.v. 

.-)  =  (0.60.  1.00.  1.00) 

1.41 

1.60 

2.14 

2.23 

2.46 

2.65 

2.83 

2.87 

3.06 

3.09 

a 

AET 

1065 

AET 

2578 

1065 

CYSA 

1065 

CYSE 

CYSE 

1102 

p 

2578 

1102 

1102 

1102 

1143 

ME2C 

2578 

AET 

2578 

1143 

(Jr.  v. 

r)  =  (0.60.  1.20.0.80) 

dc,& 

2.08 

2.33 

2.38 

2.58 

2.61 

2.80 

3.39 

3.42 

3.90 

4.10 

a 

1065 

1065 

1102 

CYSA 

2578 

CYSE 

1065 

AET 

ME2C 

3MEC 

p 

1143 

1102 

1143 

ME2C 

1102 

AET 

2578 

2578 

CYSE 

PENI 

(X.  V. 

r)  =  (0.60.  1.20.0.40) 

4* 

1.27 

1.99 

2.24 

2.32 

2.55 

2.87 

3.09 

3.23 

3.45 

3.82 

a 

1102 

1065 

1065 

CYSA 

2578 

CYSE 

ME2C 

1065 

2578 

MF.2C 

p 

1143 

1143 

1102 

ME2C 

1102 

AET 

AET 

2578 

1143 

CYSE 

In  this  case,  the  carboxyl  and  the  amino  groups  are  more  weighted  than  the  thiol 
one  (y  +  x)/2  >  z,  in  all  cases). 

9.  The  AMrimethyl  cysteamine-AET  couple  placed  63  times  with  9  times  in  the 
first  rank. 

10.  Another  noteworthy  fact  is  that  the  nonradioprotective  molecule  penicillamine 
is  almost  excluded  from  the  10  best  da/3.  This  drug  appears  only  5  times,  coupled 
with  the  N-3Me-cysteamine,  in  tenth  rank. 

Similarity  Matrices  in  the  Vector  Space  of  Atomic  Indexes 

In  a  second  step,  we  introduced  two  lots  of  molecules  containing  the  same  basic 
fragment. 


MOLECULAR  SIMILARITY  IN  RADIOPROTECTORS 


161 


Figure  4  shows  a  first  class  of  6  aminothiols  derived  from  the  cysteamine  main 
chain:  cysteamine,  methyl-2-cysteamine,  A'-trimethyl-cysteamine,  AET.  cysteine, 
and  penicillamine. 

A  second  class  of  aminothiols  is  based  on  the  main  chain  fragment  of  WR-2578  S. 
This  lot  of  molecules  contains  the  most  potent  radioprotector  WR-1065,  the  new  syn¬ 
thesized  drugs  1-102  S  and  1-143,  and  the  most  important  endogenous  thiol  GSH. 

The  discussion  of  similarity  between  molecules  inside  these  two  classes  will  be 
conducted  by  using  the  (mn)'l/!  connectivity  index  with  weighted  heterobonds  and 
similarity  matrices  containing  elements  computed  in  the  atomic  index  vector  space. 

Tables  VIII  and  IX  show  the  obtained  results  for  atomic  indexes  in  the  4-dimen¬ 
sional  vector  space  for  the  first  lot  of  drugs  and  in  the  6-dimensional  vector  space  for 
the  second  one. 

Then,  we  have  computed:  the  125  similarity  matrices  for  the  6  drags  derived  from 
cysteamine  (each  matrix  contains  15  daB)  and  the  125  similarity  matrices  for  the  5 
molecules  based  on  the  WR-2578  S  fragment  (each  matrix  contains  10  daB),  with  the 
same  sets  (x,  y,  z)  described  above. 


Table  VIII.  Atomic  indexes  of  the  4  atoms  in  the  main  chain  S| — C;  —  C,  —  N*. 
computed  with  the  (mn)"12  connectivity  index  and  with  bond  weights:  C  —  C  =  1: 
C  — O  =  x:  C  —  N  =  v;  C  —  S  =  :. 


Cysteamine 


1  +  l.06lr  +  0.2S0y; 
1.5  +  0.354.V  +  0.707; 

1.5  +  0.707v  +  0.354; 
t  +  1 .06 1  v  +  0.250v; 


1  +  1.146:  +  0. I67v: 

1.986  +  0.289.V  +  0.577; 

1.644  +  0.707.1  +  0.236: 

I  +  M63v  +  0. 167y; 

1  +  1.061:  +  0.125 yc  +  0.187y2: 

1.5  +  0.177 y  +  0.707;  +  0.265.V2 

1.5  +  0.354y  +  0.354;  +  0.530>2 
1  +  2.030.V  +  0.1 25v; 

1  +  1.158;  +  0.648v; 

1.5  +  0.354v  +  0.5;  +  0.204;2  +  0.236v;; 

1.5  +  0.707y  +  0.250;  +  0.102;:  +  O.llSv: 
I  +  1.06lv  +  0. 177 yc  +  0.072V;-’  +  0.083v; 

1  +  1.092;  +0.11  lx:  +  O.I67v; 

1.544  +  0. 157x  +  0.236.V  +  0.707; 

1.742  +  0.385x  +  0.577y  +  0.289; 

1  +  1.006y  +  0.222xy  +  O.I67yc 

I  +  1.192;  +  0.056x;  +  0.083y; 

2.385  +  0.  Mix  +  0.  !67y  +  0.5; 

1.911  +  0.385.1  +  0.577 y  +  0.144; 
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Table  IX.  Atomic  Indexes  of  the  6  atoms  in  the  main  chain  S| — C2 — C, — Na — C, — C6,  computed 
with  the  (m/i)"1'2  connectivity  index  and  with  bond  weights:  C  —  C  =  1;  C  —  O  =  a;  C —  N  =  y; 

C— S  =  2. 


WR-2578  S 


WR- 1065 


I- 102  S 


I- 143 


GSH 


'a, 

<*i 

a-i 

at 

as 


+  1.061z  +  O.I77yz  +  0.132y22  +  0.031y3z 
.5  +  0.250y  +  0.7072  +  0.187y2  +  0.044y3 
.5  +  0.5.v  +  0.3542  +  0.375y2  +  0.088y3 
+  l.Sy  +  0. 177y2  +  0.177y2 
.5  +  0.854y  +  0.375y2  +  0.088y2z 
.5  +  0.957y  +  0.187y2  +  0.044y2z 


'a , 
a2 
a 3 
at 
a, 
a<> 


1  +  1.06b  +  0. 177y2  +  0. 154y2z  +  0.016y3z 
1.5  +  0.250y  +  0.7072  +  0.218y!  +  0.022v3 
1.5  +  0.5y  +  0.3542  +  0.437y2  +  0.044.V3 

1  +  1.625.V  +  0. I77yz  +  0.088y2 
I.7S  +  0.677y  +  0.375y2  +  0  088y22 

2  +  0.604.V  +  0.187y2  +  0.044y2z 


'a  i 
ai 
a j 
at 
a, 
*6 


+  I.O6I2  +  0.  !77yz  +  0.102.v2z2  +  0.042xy2z  +  0.021v3z 
.5  +  0.250.V  +  0.7072  +  0.144v2  +  0.059xyJ  +  0.029y3 
.5  +  0.5y  +  0.3542  +  0.287 v2  +  0.118xy2  +  0.059y3 
+  1.325y  +  0.236xy  +  O.I18.V2  +  0.177.V2 
.408  +  0.577x  +  0.697y  +  0.306y2  +  0.072 y2z 
.408  +  0.236x  +  0.873y  +  0.125y2  +  0.029v2z 


fa, 


a  2 


+  1  061z  +  0.177vz  +  0. 116y2z  +  O.OIOy2z2  +  0.042xy2z 
.5  +  0.250y  +  0.707z  +  0.165.V2  +  0.059xy2  +  O.OI5y2z 
.5  +  0.5y  +  0.354z  +  0.329y2  +  O.llSxy2  +  0.029y2z 
+  l.408v  +  0.236x>  +  0.236vr 

.612  +  0.577x  +  0.408y  +  0.144z  +  0.306.V2  +  0.072y2z 
.908  +  0.236x  +  0.167y  +  0.354z  +  0.125.V2  +  0.029y2z 


'a  i 
a  2 

a j 
.  °t 

as 


I  +  1.092z  +  0.056xz  +  0.I57.VZ  +  0.1 10v2:  +  0.038av22  +  0.002y3: 
1.544  +  0.079a  +  0.223.V  +  0.707z  +  0  I57y2  +  0.054a>  2  +  0.003.V3 
1.742  +  0.192a  +  0.544y  +  0.289z  +  0.384.V2  +  O.I33xy2  +  0.008.V3 
1  +  1.4l3y  +  0.328av  +  0.118yz  +  0.076v2  +  0.039v'  +  O.OBav3 
1.723  +  0.609a  +  0.456.V  +  0.29lv2  +  0.032rv:  +  0.023.V3  +  0.048y2z 
+  0.016y4  +  0.005xy4 

2.180  +  0.314a  +  0.285y  +  O.II8y2  +  0.013av2  +  0.009v' 

+  0.020 y2z  +  0.007y*  +  0.002a>'4 


These  obtained  results  are  classified  using  the  best  10  Euclidean  distances  test  (in 
the  case  of  the  second  lot  of  5  molecules,  all  the  da0  are  contained  in  the  list).  In 
Table  X  we  have  reported  the  computed  classification  which  contains:  the  total  fre¬ 
quencies  of  appearance  of  a  pair  of  molecules  the  advent  of  n  times  at  the  first, 
second, . . .  tenth  rank. 

This  classification  enables  us  to  put  the  discussion  in  the  framework  of  (m«)',/2 
topological  descriptor  applied  to  atomic  indexes,  which  is  more  sensible  to  proximal 
substituents  R,  than  to  more  distant  ones  (see  Fig.  4). 
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Table  X.  Frequency  of  Appearance  of  pair  molecules  derived  from  cysteamine  in  the 
10  best  d<^  in  125  similarity  matrices  (T  is  the  total  frequencies  of  appearance  for  the 

ten  ranks). 


Pair  .Rank 

of  molecules''--^. 

T 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

CYSA-ME2C 

125 

41 

46 

20 

11 

6 

0 

1 

0 

0 

0 

CYSA-3MEC 

7! 

0 

12 

4 

13 

1 

4 

6 

9 

17 

5 

CYSA-AET 

124 

57 

12 

9 

8 

7 

4 

5 

7 

8 

7 

CYSA-CYSE 

125 

23 

35 

10 

9 

20 

5 

9 

4 

6 

4 

CYSA-PEN1 

48 

0 

0 

0 

0 

0 

9 

1 

2 

19 

17 

ME2C-3MEC 

47 

0 

0 

0 

0 

1 

5 

3 

4 

10 

24 

ME2C-AET 

125 

4 

14 

33 

7 

22 

19 

16 

8 

2 

0 

ME2C-CYSE 

125 

0 

0 

20 

48 

23 

17 

13 

1 

3 

0 

ME2C-PENI 

107 

0 

2 

10 

16 

8 

7 

28 

20 

9 

1 

/ 

3MEC-AET 

43 

0 

0 

4 

3 

7 

3 

3 

9 

9 

5 

3MEC-CYSE 

23 

0 

0 

0 

0 

0 

0 

0 

0 

8 

15 

3MEC-PENI 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

AET-CYSE 

125 

0 

0 

14 

6 

13 

43 

22 

13 

9 

5 

AET-PEN! 

59 

0 

0 

0 

0 

0 

2 

8 

5 

18 

26 

CYSE-PENI 

103 

0 

4 

1 

4 

17 

7 

10 

43 

7 

10 

For  the  first  family  of  molecules  derived  from  cysteamine,  it  is  easy  to  verify 
that  the  most  similar  couples  are:  cysteamine-Me-2-cysteamine,  cysteamine-AET, 
and  cysteamine-cysteine.  Other  pairs  having  a  125  frequency  of  appearance  are 
Me-2-cysteamine-AET,  Me-2-cysteamine-cysteine,  and  AET-cysteine.  This  is  a 
very  logical  result  since  cysteamine  derives  directly  from  cysteine  by  decarboxylation 
of  this  last  pair  and  Me-2-cysteamine  is  the  more  isotopologous  neighbor  of  cys¬ 
teamine. 

Another  important  fact  is  that  Me-2-cysteamine  is  a  good  radioprotector 
(DRF  =  1.8  >  DRF  of  cysteamine)  with  the  advantage  of  reducing  the  toxic  effects 
of  cysteamine. 

It  is  also  very  interesting  to  note  the  exclusion  of  penicillamine-/V-3Me-cysteamine 
couple. 

When  observing  the  obtained  classification  with  regard  to  the  second  family  of 
molecules  derived  from  the  main  chain  of  WR-2578  S,  it  is  interesting  to  observe 
that  the  more  similar  couples  are:  WR-2578  S-WR-1065.  WR-2578  S-l-102  S, 
WR- 1065-1- 143,  1-102  S— 1-143  (the  best  frequency  of  appearance  in  first  rank),  and 
WR- 1065-1- 102  S.  (Table  XI). 

These  similarities  are  reasonable  with  regard  to  the  connectivity  index  ( mn)~'/z 
properties.  Concerning  the  pairs  containing  GSH,  it  is  also  logical  to  find  a  poor  simi¬ 
larity  in  the  first  ranks  of  Table  X,  because  of  the  great  size  of  glutathione. 

Nevertheless,  it  is  interesting  to  note  that  the  GSH-I-143  couple  is  the  more  simi¬ 
lar  with  a  frequency  of  appearance  of  36  in  the  first  five  ranks;  this  frequency  is  only 
17  for  the  WR-1065-GSH  pair,  6  for  the  WR-2578  S-GSH  pair,  and  zero  for  the 
I-102-GSH  pair. 
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Table  XI.  Frequency  of  appearance  of  pair  molecules  derived  from  WR-2578  S  in  125 

similarity  matrices. 


Pairv''\Ftank 
of  molecules — . 

T 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

2578-1065 

125 

23 

44 

13 

19 

23 

2 

1 

0 

0 

0 

2578-1102 

125 

25 

27 

20 

22 

27 

4 

0 

0 

0 

0 

2578-1143 

125 

5 

4 

16 

15 

9 

24 

19 

17 

16 

0 

2578-GSH 

125 

0 

0 

4 

1 

1 

5 

6 

14 

21 

73 

1065-1102 

125 

9 

15 

27 

20 

12 

24 

5 

5 

6 

2 

1065-1143 

125 

17 

19 

16 

20 

24 

18 

2 

4 

2 

3 

1065-GSH 

125 

0 

0 

2 

6 

9 

17 

42 

31 

18 

0 

1102-1143 

125 

46 

15 

ii 

13 

10 

13 

9 

6 

1 

1 

1102-GSH 

125 

0 

0 

0 

0 

0 

0 

17 

28 

45 

35 

1143-GSH 

125 

0 

1 

16 

9 

10 

18 

24 

20 

16 

n 

Conclusion 

On  the  whole  of  obtained  results,  we  can  emphasize  the  following  assertions: 

(1)  The  mathematical  topological  description  introduced  by  Randic  gives  a  good 
framework  for  discussing  the  similarity  in  the  studied  aminothiol  class  of  molecules. 

(2)  In  the  global  approach  of  similarity  in  the  vector  space  of  paths,  the  most 
important  fact  is  the  selection  of  the  WR- 1065-1- 102  S  couple,  which  are  both  radio- 
protective  and  anticancer  drugs.  Another  interesting  point  is  the  exclusion  of  penicil¬ 
lamine  from  the  lot  as  a  nonradioprotective  molecule. 

(3)  Analysis  of  the  similarity,  when  using  the  ( mnYvl  atomic  index,  leads  to  logi¬ 
cal  couples  of  molecules  in  the  two  studied  families. 

The  new  synthesized  drugs  1-102  S  and  1-143  are  associated  with  the  best  fre¬ 
quency  of  appearance  in  the  first  rank.  It  is  also  notable  to  observe  that  for  the  best 
similar  pair  involving  glutathione  (GSH-I-143)  we  obtain  an  association  between  two 
molecules  with  a  weak  radioprotective  effect. 

This  study  shows  that  when  introducing  the  ideas  and  the  vocabulary  of  Dubois 
[24],  we  can  say:  If  we  start  with  a  substructure  (cysteamine),  we  can  build  a  struc¬ 
ture  (WR-2578  S  fragment),  then  an  hyperstructure  GSH. 

All  molecules  of  these  families  are  isotopologous  and  isochromatic  (same  atoms  at 
the  nodes)  with  regard  to  the  main  chains  described  in  Figure  4. 

Finally,  this  work  suggests  new  lines  of  future  research  associating  topology  and 
quantum  properties  with  the  aim  of  new  perspectives  in  drug  design. 
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Abstract 

Mobile  protons  in  water  bound  at  the  interface  with  the  membrane  are  analogous  in  a  number  of  respects 
to  the  conduction  electrons  in  a  thin  layer  of  metal  spread  on  the  surface  of  a  dielectric.  The  hypothesis 
analyzed  in  this  paper  is  that  such  mobile  hydrogen  bonds  may  be  paired  by  propagating  electronic  oscilla¬ 
tions  in  polar  side  groups  of  the  membrane.  The  model  is  dual  to  the  bcs  theory  of  electronic  superconduc¬ 
tivity  in  that  (1)  the  mobile  H*  bonds  play  the  role  of  conduction  electrons  and  (2)  the  pairing  interaction 
has  its  origin  in  electronic  excitations  rather  than  lattice  vibrations.  The  pairing  mechanism  is  similar  to 
that  in  Little’s  proposed  room  temperature  organic  superconductor,  except  that  it  involves  transient  alter¬ 
ations  in  ground  state  energies  of  polarizable  electrons  in  a  colorless  membrane  rather  than  polarization  of 
dye-like  side  groups.  Numerical  estimates  based  on  formulae  applicable  to  metallic  superconductors  show 
that  condensation  of  proton  pairs  would  be  feasible  in  small  connected  domains  on  the  membrane  surface 
if  the  water  structure  is  closely  packed  and  the  effective  mass  of  the  protons  sufficiently  reduced  (though 
significant  disanalogies  between  the  bound  water  and  metal  cases  make  the  quantitative  applicability  of 
these  formulae  unlikely).  Some  of  the  factors  that  favor  a  superfluid  transition  are  high  salt  concentration, 
polarizable  macromolecules  on  the  membrane  that  increase  the  three-dimensionality  of  the  interaction,  and 
the  presence  of  proton  donors  and  acceptors  other  than  oxygen.  The  dynamic  order  inherent  in  a  proton 
superflow  could  provide  the  basis  for  a  wide  variety  of  highly  ordered  motions  in  biological  systems. 

Introduction 

The  manifest  spatial  and  temporal  coherence  of  biological  organisms  led  early  in¬ 
vestigators  in  the  fields  of  quantum  theory  and  superconductivity  to  suggest  that  low 
temperature  order  might  play  a  role  in  living  matter  [1,2],  However  attractive  these 
speculations,  the  difficulty  remains  that  no  substance  has  been  found  that  convinc¬ 
ingly  exhibits  any  of  the  known  forms  of  superfluidity  at  physiological  temperatures. 
This  is  connected  with  the  fact  that  superfluid  behavior  is  based  on  a  Bose  condensa¬ 
tion  of  paired  particles  which  individually  obey  the  exclusion  principle.  All  known 
mechanisms  of  pairing  yield  pairing  energies  too  weak  relative  to  kT  for  condensation 
to  occur. 

The  pertinent  example  is  the  bcs  theory  of  superconductivity.  The  superfluidity  of 
the  electrons  is  based  on  an  exchange  of  virtual  phonons  which  derives  from  the  in¬ 
teraction  with  nuclei  in  the  metal  lattice  [3-6].  As  an  electron  moves  through  the  lat¬ 
tice  it  attracts  the  positively  charged  nuclei,  which  in  turn  attract  another  electron. 
This  indirect  attraction  increases  as  the  shift  in  charge  density  associated  with  the  nu¬ 
clear  coordinates  increases,  hence  increases  as  the  nuclei  are  more  displaced  from 
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their  equilibrium  positions.  As  a  consequence,  the  transition  to  superfluidity  occurs  at 
a  temperature  proportional  to  M~12,  where  M  is  the  isotopic  mass  of  the  lattice  nuclei 
that  mediate  the  pairing  interaction.  As  long  as  lattice  nuclei  are  the  source  of  the 
pairing  interaction,  M  is  never  small  enough  to  accommodate  a  transition  to  the  su¬ 
perconductive  state  at  physiological  temperatures. 

Little  [7]  has  proposed  that  it  might  be  possible  to  synthesize  an  organic  molecule 
with  superconducting  properties  at  room  temperature  by  replacing  the  nuclear  oscilla¬ 
tors  in  the  bcs  theory  with  virtual  electronic  oscillators.  In  Little's  model,  the  free 
electrons  are  replaced  by  delocalized  electrons  that  occupy  the  spine  of  an  organic 
polymer  and  the  lattice  nuclei  are  replaced  by  electrons  in  dye-like  side  groups.  To 
the  extent  that  the  analogy  is  correct,  the  isotope  effect  suggests  that  the  transition 
temperature  could  be  as  much  as  (M /mrl)v2  times  larger  than  for  a  metallic  supercon¬ 
ductor,  where  m,,  is  the  mass  of  the  electron.  This  is  on  the  order  of  300  times  larger, 
well  above  room  temperature.  Nevertheless,  despite  intense  efforts,  no  room  temper¬ 
ature  organic  superconductor  has  been  synthesized.  One  problem  is  that  organic  poly¬ 
mers  are  effectively  one-dimensional  systems  (in  the  absence  of  stacking),  but  a 
superfluid  phase  transition  is  only  possible  in  systems  with  dimensionality  greater 
than  two.  thus,  at  minimum  in  a  surface  of  finite  thickness  [8— 10] .  A  second  problem 
may  be  that  the  dynamics  of  any  two  groups  of  electrons  in  a  single  organic  molecule 
are  too  sensitively  dependent  on  one  another  for  one  to  induce  a  pairing  interaction 
in  the  oiher. 

The  purpose  of  this  article  is  to  suggest  a  new  mechanism  of  superflow  behavior  in 
biological  systems  that  is  in  a  sense  the  dual  of  the  bcs  theory.  As  in  the  Little 
model,  electronic  oscillations  provide  the  coupling  mechanism.  However,  the  parti¬ 
cles  paiied  are  protons  in  an  adjacent  aqueous  film  rather  than  electrons  belonging 
to  the  same  molecule.  The  boundary  conditions  required  for  this  type  of  interaction 
are  met  at  water-lipid  interfaces,  such  as  occur  with  all  biological  membranes.  The 
ypothesis  is  that  mobile  hydrogen  bonds  in  the  layer  of  bound  water  are  paired 
by  propagating  electronic  oscillations  in  the  polar  side  groups  of  the  membrane . 

umerica!  estimates  based  on  this  analogy  are  necessarily  highly  approximative, 
especially  as  the  analogy  is  strained  at  a  number  of  potentially  critical  points.  At  this 
stage  we  could  probably  present  the  model  in  a  variety  of  different  quantitative 
regimes.  However,  as  a  detailed  model  is  more  likely  to  be  clear  and  useful  than 
a  vague  one  we  will  adopt  the  policy  of  making  specific  choices  among  alternative 
assumptions  wherever  possible.  We  note  that  the  oscillations  will  involve  shifts  in 
the  ground  state  energy  of  a  colorless  membrane  rather  than  polarization  of  dye¬ 
like  side  groups. 

Bound  Water  and  Metal  Compared 

The  structure  of  the  membrane  and  the  proposed  interaction  are  schematically  illus¬ 
trated  in  Figures  1  and  2.  In  this  section  we  briefly  review  some  pertinent  facts  about 
the  membrane  and  consider  the  extent  to  which  hydrogen  bonds  near  the  membrane 
can  be  treated  as  a  system  of  weakly  interacting  fermions  analogous  to  the  weakly  in¬ 
teracting  electrons  of  a  metallic  superconductor. 

In  electron  micrographs,  biological  membranes  appear  as  two  high  electron  density 
layers  spanning  approximately  75  A  and  sandwiching  a  layer  of  low  electron  density. 
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Figure  1.  Lipid  bilayer.  The  ionic  and  polar  groups  of  phospholipids  (represented  by  cir¬ 
cles)  are  in  contact  with  the  layer  of  bound  water.  Wavy  lines  represent  fatty  acid  chains. 
Surface  and  integral  proteins  are  not  shown  [see  Ref  II) 
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Figure  2.  Schematic  diagram  of  membrane  mediated  proton-proton  pairing  interaction 
Circles  represent  polar  head  groups  and  wavy  lines  fatty  acid  chains  belonging  to  no  of  the 
two  surfaces  of  a  lipid  bilayer.  The  hopping  of  a  proton  in  the  water  layer  at  time  •  attracts 
an  electron  in  a  nearby  polar  group  (top  left).  This  initiates  an  oscillatory  wave  of  electronic 
polarization  in  the  membrane  and  at  some  later  time.  a  polarized  side  grout  attracts  a 
second  hopping  proton  (bottom  right).  The  two  protons  are  attracted  to  each  olhc'  bv  virtue 
of  being  attracted  to  the  same  mobile  charge  center. 


The  model  generally  accepted  has  as  its  essential  component  a  lipid  bilayer,  with 
nonpolar  groups  pointing  inward  and  arrays  of  polar  groups  pointing  outward  into  the 
aqueous  medium  [11,  12|.  A  large  fraction  of  this  surface  is  coated  with  proteins, 
glycoproteins,  and  mucopolysaccharides  that  also  have  outward  pointing  polar 
groups.  Other  proteins  penetrate  into  the  bilayer  or  extend  through  it.  The  pertinent 
feature  of  water  are  the  protons,  or  mobile  hydrogen  bonds,  that  hop  between  oxygen 
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atoms.  Molecular  dynamics  studies  indicate  that,  due  to  cooperative  effects  of  proton 
acceptance  on  proton  donation,  water  structures  itself  into  networks  of  hydrogen 
bonds  with  uninterrupted  path  vays  running  in  all  directions  from  any  macroscopic  re¬ 
gion  of  the  liquid  [13].  The  structure  of  bound  water  is  still  conjectural.  However, 
nuclear  magnetic  resonance  (nmr)  studies  indicate  that  structuring  occurs  in  layers, 
with  layers  closest  to  the  membrane  having  greater  organization  and  a  larger  fraction 
of  mobile  protons  [14, 15],  The  inner  layer  probably  spans  3  to  5  molecular  diameters 
(ca.  8.4  to  ca.  14.0  A),  but  how  far  the  influence  of  the  membrane  extends  altogether 
is  extremely  uncertain  [16].  The  innermost  hydrogen  bonds  extend  into  the  interstices 
between  phospholipid  groups,  and  proton  channels  consisting  of  chains  of  hydrogen 
bonds  are  believed  to  extend  across  the  membrane.  These  channels,  which  have 
given  rise  to  the  “proton  wire"  concept,  are  probably  associated  with  proteins  [17- 
19],  though  some  authors  have  viewed  them  as  comprising  purely  aqueous  hydrogen 
bonds.  In  either  case  it  is  likely  that  aqueous  and  nonaqueous  hydrogen  bonds  become 
mixed  at  the  water-membrane  and  water-protein  interface. 

In  the  hypothesis  to  be  analyzed,  free  protons  hopping  through  the  water  lattice  at 
the  interface  with  the  membrane  attract  electrons  associated  with  the  polar  or  ionic 
groups  of  phospholipid  or  with  bilayer-associated  proteins.  When  the  proton  hops 
again,  the  residual  displacement  of  the  electron  exerts  an  attractive  effect  on  a  second 
proton.  Following  Ginzburg  [20]  we  shall  borrow  the  term  “exciton”  for  this  elec¬ 
tronic  excitation  and  tentatively  picture  it  as  a  transverse  wave  of  polarization  propa¬ 
gating  through  the  lattice  of  polar  side  groups  on  the  surface  of  the  bilayer.  As  in  the 
bcs  model,  the  intuitive  picture  is  that  particles  attracted  to  the  same  charge  center 
act  as  if  they  are  attracted  to  each  other,  except  that  in  this  case  the  vibrations  of  the 
lattice  nuclei  are  replaced  by  propagating  disturbances  in  the  electronic  structure  of 
the  membrane  and  the  conduction  electrons  are  replaced  by  mobile  protons  in  the 
layer  of  bound  water.  When  a  proton  hops,  the  residual-induced  membrane  charge 
should  make  it  more  favorable  for  a  neighbor  in  the  chain  of  hydrogen  bonds  to  hop. 
Thus  it  is  likely  that  the  interaction  between  membrane  electrons  and  water  protons 
yields  a  facilitating  effect  on  the  flow  of  protons  that  is  more  local  than  the  pairing 
interaction,  but  which  is  desirable  from  the  standpoint  of  pairing  due  to  the  fact  that  it 
enhances  the  proton  wire  aspect  of  the  hydrogen  bonds  near  the  membrane. 

As  in  Little’s  discussion  of  a  linear  organic  molecule,  the  consideration  underlying 
the  “superwater”  concept  is  that,  all  else  constant,  the  isotope  effect  could  in  princi¬ 
ple  accommodate  an  increase  in  the  coupling  interaction  by  a  factor  of  (M/m,,)' 2,  or 
about  300  times  over  that  in  a  metallic  superconductor.  However,  since  the  water- 
membrane  interface  is  a  surface  of  finite  thickness,  a  basic  requirement  for  the  occur¬ 
rence  of  a  Bose  condensation  to  a  coherent  state  is  met.  Furthermore,  the  absence  of 
covalent  interactions  between  the  lipid-protein  phase  and  the  water  phase  allows  the 
electronic  oscillations  in  the  former  to  provide  the  excitonic  interaction  that  couples 
the  protons  in  the  latter  without  compromising  the  integrity  of  normal  chemistry,  as 
would  likely  be  the  case  if  all  the  interactions  occurred  in  a  single  molecule. 

As  a  consequence  of  this  chemical  distinctness,  the  protons  in  bound  water  are  in 
some  measure  analogous  to  the  conduction  electrons  in  a  thin  layer  of  metal  spread  on 
the  surface  of  a  dielectric  [see  Refs.  21 , 22],  The  difference  is  that  the  protons  in  nor- 
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mal  water  are  localized,  their  free  motion  consisting  in  a  diffusive  hopping  which 
corresponds  to  very  tight  binding  and  which  is  probably  more  analogous  to  localized 
electrons  that  diffuse  than  to  electrons  moving  relatively  freely  in  a  periodic  potential. 
The  attraction  between  H+  and  an  oxygen  atom  is  ca.  .2  eV,  almost  one-twentieth  of 
the  covalent  OH  bond.  Whether  protons  behave  as  pairs,  however,  depends  not  on 
the  strength  of  these  bonds,  which  can  be  viewed  as  a  modification  of  the  potential  in 
which  they  move,  but  on  whether  the  exciton  mediated  H+-H+  bond  is  sufficiently 
strong  to  overcome  the  Coulomb  repulsion  and  thermal  disruption.  A  second  differ¬ 
ence  is  that  the  ground  state  of  conduction  protons  in  water  is  not  supported  by  an 
aufbau  structure  of  lower  energy  protons  as  are  the  conduction  electrons  at  the  Fermi 
level  in  a  metal.  The  hopping  of  protons  in  normal  water  is  a  fluctuation  phenomenon, 
rather  than  a  minimum  kinetic  energy  compatible  with  the  exclusion  principle.  This  is 
connected  with  the  fact  that  mobile  protons  are  too  heavy  and  occur  with  too  low  a 
concentration  in  water  for  Bose-Einstein  statistics  to  be  operative  at  body  tempera¬ 
ture.  In  order  for  the  protons  to  degenerate — that  is,  be  sufficiently  delocalized  for 
the  wave  function  to  apply  to  the  whole  proton  system — it  would  be  necessary  for 
their  effective  mass  to  be  substantially  reduced  at  the  water-membrane  interface.  For 
these  and  other  reasons,  the  surrounding  interactions  that  determine  whether  a  coher¬ 
ent  state  can  emerge  are  very  different  in  water,  regardless  of  its  detailed  structure  in 
the  bound  state  than  in  metals. 

The  bcs  model  ignores  surrounding  interactions,  treating  a  metal  as  a  system  of 
weakly  interacting  fermions  in  a  square  well  [23].  The  water-membrane  interface 
probably  does  not  lend  itself  to  such  abstraction;  nevertheless,  we  will  proceed  on  the 
basis  of  the  working  assumption  that  major  factors  determining  whether  a  coherent 
state  can  occur  are  represented  in  the  bcs  framework  and  that  it  is  therefore  instruc¬ 
tive  to  consider  whether  they  are  more  or  less  favorably  met  by  conduction  protons  in 
water  or  conduction  electrons  in  metal .  In  making  this  comparison  we  will  first  con¬ 
sider  the  feasibility  of  H*-H+  bonds  occurring  at  the  interface  assuming  a  normal 
water  structure,  and  then  consider  how  the  picture  has  to  be  modified  in  order  to  sat¬ 
isfy  the  requirements  for  Bose-Einstein  statistics  to  be  operative.  That  the  bilayer  is 
dressed  with  a  forbidding  variety  of  proteins  and  carbohydrates  will  be  viewed  as  a 
simplifying  rather  than  as  a  complicating  factor,  since  it  is  entirely  justified  to  postu¬ 
late  that  these  macromolecules  could  be  selected  in  the  course  of  evolution  to  tailor 
the  interactions  at  the  water-membrane  interface  to  favor  a  superfluid  transition. 


Feasibility  of  Proton  Pairing 

The  matrix  element  of  the  interaction  energy  between  two  protons  can  be  written 
as  V  =  VCoul  +  Ve, ,  where  V'Coul  is  the  Coulomb  repulsion  and  V,t  is  the  exciton- 
mediated  proton-proton  attraction.  Whether  proton  pairs  will  form  depends  on  the 
strength  of  the  Coulomb  repulsion,  the  strength  of  the  excitonic  proton-proton  attrac¬ 
tion,  the  relaxation  time  of  proton-proton  correlations,  the  effectiveness  with  which 
excitons  are  created  and  absorbed  due  to  the  interaction  of  mobile  H"  bonds  with 
membrane  electrons,  and  the  constraints  imposed  on  exciton  energy  by  the  molecular 
structure  of  the  membrane. 
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Coulomb  Repulsion 


The  Debye  Hiickcl  formula  for  the  proton  pair  potential  in  an  aqueous  solution 
may  be  approximated  by 


(1) 


where  Q  is  the  proton  charge,  O  is  the  dielectric  constant,  and  r  is  assumed  large  in 
comparison  to  the  proton  collision  parameter  (see,  e.g..  Refs.  24,25).  The  inverse 
length.  K ,  is  related  to  the  ionic  strength  1/2  Xn,Z/  by 


K2  = 


4t tQ2 
DkhT 


(2) 


where  Z,  is  the  valence  of  ion  /.  n,  the  average  number  per  uni:  volume  of  this  ion, 
and  kh  is  Boltzmann's  constant.  This  corresponds  to  the  screening  parameter. 

=  4ire2na/kbT ,  for  the  electron-electron  interaction  in  a  metal,  except  that  Xn,Z, 
is  replaced  by  n„,  the  local  average  density  of  charge  carriers  [26], 

The  Coulomb  interaction  is  attenuated  by  the  high  dielectric  constant  of  water 
(D  —  80).  The  main  factor,  however,  is  screening.  In  a  good  metallic  superconductor 
n0  is  typically  ca.  102’  cm"3,  leading  to  an  effective  screening  distance  of  ca.  5  to 
ca.  1  A  (see  Ref.  27).  Sea  water,  with  a  composition  very  close  to  that  of  blood,  has 
a  concentration  about  5  x  10"'  mol  NaCl,  giving  an  effective  screening  distance  of 
less  than  10  A  due  to  this  salt  alone.  Ionic  concentrations  inside  cells  can  differ  sub¬ 
stantially  due  to  ionic  pumps,  as  in  the  replacement  of  Na‘  by  K*  as  the  chief  intra¬ 
cellular  anion  at  the  resting  potential.  Screening  is  also  influenced  by  Mg  ", 
substantial  and  highly  variable  reservoirs  of  CaJ‘ ,  by  H,0‘  and  OFT  (only  2  x  10  u 
mol/L  in  normal  water,  but  higher  in  bound  water),  and  by  organic  anions.  The  latter 
include  soluble  polyelectrolytes,  such  as  soluble  proteins,  which  have  a  major  screen¬ 
ing  effect  due  to  their  high  Z  value.  These  must  be  distinguished  from  the  charged 
mucoproteins  and  glycoproteins  on  the  membrane,  which  represent  fixed  charges, 
and  which  are,  therefore,  more  likely  to  contribute  to  mediating  the  excitonic  interac¬ 
tion  between  proteins  than  to  the  screening  of  the  Coulomb  repulsion.  Finally,  the 
membrane  appears  to  concentrate  salts  at  a  level  higher  than  their  average  intracellu¬ 
lar  concentration,  suggesting  that  screening  would  be  especially  strong  at  the  mem¬ 
brane-water  interface  (see  Ref.  28). 

The  extent  of  screening  evidently  depends  sensitively  on  a  number  of  salts  and 
polyelectrolytes  whose  concentrations  are  influenced  by  intracellular  control  mecha¬ 
nisms.  However,  no  unusual  assumptions  are  necessary  in  order  to  obtain  levels  of 
screening  sufficient  to  prevent  the  Coulomb  repulsion  from  overriding  a  proton- 
proton  pairing  interaction. 

Inorganic  ions  and  polyelectrolytes  have  a  second  pertinent  effect,  distinct  from 
screening.  Their  presence  significantly  alters  the  network  of  hydrogen  bonds,  modi¬ 
fying  the  flow  of  protons  and  thereby  potentially  facilitating  the  pairing  interaction. 


Proton-proton  Interaction 

In  analogy  to  the  electron-phonon  case  and  following  Little  [7.29]  and  Ginzburg 
[20,  21]  we  write  the  matrix  element  of  the  proton-exciton  interaction  as 
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1  [2jMgfeUq )' 
*>  -  w2,  j 


(3) 


here  Sco„(q)  =  —  £,,  is  the  energy  of  an  exciton  of  momentum  hq,  associated 

with  a  charge  density  fluctuation  of  a  membrane  electron  between  energy  levels  E'tl 
and  ho>pr  =  E'pr  -  £,„  is  the  shift  in  proton  energy  associated  with  a  charge  den¬ 
sity  fluctuation  in  the  proton  system  from  a  state  of  momentum  (p,.  p2)  to  a  state  of 
momentum  (p[,  p2);  and  Mq  is  the  matrix  element  for  emission  and  absorption  of  ex- 
citons  that  connects  the  excited  states  of  the  membrane  to  the  protons  of  water.  If 
o)„  >  (op,  overscreening  occurs  and  the  interaction  is  attractive,  the  strength  of  the 
interaction  increasing  as  the  resonant  frequency  is  more  closely  approached  and  as  Mq 
increases.  A  proton-phonon  interaction  analogous  to  the  conventional  electron- 
phonon  interaction  should  also  operate  at  the  membrane.  This  would  be  repulsive, 
but  can  be  ignored  since  the  massiveness  of  membrane  molecular  structures  means  a 
small  a)ph.  In  this  section  we  consider  plausible  constraints  on  each  of  the  terms  in 
Eq.  (3). 

Proton  Motions  in  Normal  Water.  A  liter  of  pure  liquid  water  (ca.  55  mol)  at 
298  K  contains  about  6  x  1016  proton  charge  carriers  at  any  given  time,  each  with  a 
mean  lifetime  of  about  10“12  s  in  the  nonhopping  state  [17, 30].  These  protons  may  be 
thought  of  as  undergoing  harmonic  oscillations  in  local  wells  of  a  periodic  potential, 
but  occasionally  hopping  (e.g..  tunneling)  to  a  neighboring  well  [31].  If  a  proton 
hops  into  a  well,  another  one  hops  out,  though  not  necessarily  in  the  same  direction. 
To  get  a  rough  estimate  of  the  significance  of  replacing  conduction  electrons  with 
protons,  we  will  take  the  rather  naive  view  that  the  proton  is  traveling  along  a  chain 
of  hydrogen  bonds  and  that  a  local  average  velocity  can  be  assigned  to  it;  that  is.  we 
for  the  moment  ignore  the  fact  that  the  translational  symmetry  of  the  momentum  is 
badly  broken  by  the  interaction  between  H"  and  O"  and  that  the  momentum  of  the 
proton  is  largely  transmitted  to  the  whole  water-membrane  lattice  after  each  hop. 
This  picture  of  a  “proton  wire"  becomes  more  useful  near  the  membrane,  where  the 
number  of  proton  charge  carriers  should  be  significantly  increased,  and  as  mentioned 
earlier  has  been  used  to  picture  the  migration  of  protons  across  the  membrane  in  the 
presence  of  a  potential  difference. 

With  this  in  mind,  we  write  Epr  -  Epr  =  ip'2  -  p2)/2mp,.  Substituting 
p'  =  p  +  hq  gives  E’pr  =  (2phq  +  h2q2)/2mpr.  Since  mpr  ~  lO’m,,,  the  energy- 
change  in  the  proton  system  would  be  reduced  by  a  factor  of  10'  even  if  p  could  have 
the  same  value  as  in  the  electron  case.  Setting  p  =  mprv  gives  E'pr  -  Epr  =  vp,hq  4 
h:q2/2mpr.  Since  the  water  molecular  diameter  is  2.8  A  and  since  the  mobile  proton 
remains  associated  with  a  water  molecule  for  only  ca.  10" 12  s,  we  can  take 
ypr  ~  2.8  x  I04  cm/s  as  the  effective  velocity  (recognizing  that  in  reality  the  proton 
motion  consists  of  constituent  tunneling,  rotational,  and  translational  motions  and 
that  the  hops  are  not  necessarily  in  the  same  direction).  Alternatively,  taking  ax  in 
the  uncertainty  relation  ap ,  h/ax  as  the  water  molecular  diameter  gives  a  value  of 
a\p,  —  2.2  x  |04  cm/s.  which  is  sufficiently  similar  to  vp,  —  2.8  x  104  cm/s  to  in¬ 
dicate  that  tunneling  is  the  enabling  motional  process.  If  we  take  ax  -  78  A  (the 
average  minimum  distance  thai  II  ‘  must  traverse  when  it  jumps  from  one  oxygen 
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atom  to  another)  (rvpr  ~  7.9  X  104  cm/s,  indicating  that  the  maximum  velocity  is 
not  too  different  from  the  average  velocity.  The  difference  is  due  to  the  fact  that  the 
rotation  of  the  water  molecule  is  the  rate-limiting  step.  The  NMR-measured  value  of 
the  rate  of  proton  transfer  in  the  reaction  H20  +  HjO1  — »H,0+  +  H20  is 
10.6  X  109  liter  moP's'1  [30, 32].  Expressed  in  H  9  transfers  per  second,  this  corre¬ 
sponds  to  a  velocity  vpr  ~  1.7  x  104  cm/s,  the  same  order  of  magnitude  as  our  cal¬ 
culated  proton  velocities.  Since  the  typical  velocity  of  conduction  electrons  at  the 
Fermi  surface  at  0  K  is  ca.  108  cm/s,  veJvpr  ~  10\  implying  that  ha>pr  is  between 
103  and  104  times  smaller  than  h<oti.  Even  if  the  proton  velocity  were  substantially 
enhanced  in  H-bonded  chains  at  the  membrane  interface  the  bracketed  term  in  Eq.  (3) 
is  certain  to  be  positive,  satisfying  the  sine  qua  non  requirement  for  pairing  to  occur. 

This  equation  may  thus  be  approximated  by  Va - |Af  2|/oi„(q),  whicl  emphasizes 

the  fact  that  Va  will  be  big  enough  to  overcome  the  Coulomb  repulsion  only  if  the 
electronic  excitation  energy,  fttu„(q),  is  not  too  large,  otherwise  “over”  overscreen¬ 
ing  will  occur. 

The  relaxation  time  between  collisions  that  alter  the  direction  of  an  electron  in  a 
metal  is  on  the  order  of  10" 14  s  when  defined  through  conductivity.  The  time  of  flight 
of  a  mobile  H+  between  neighboring  oxygen  atoms  is  at  least  on  the  order  of 
10"13  seconds.  Thus  there  is  sufficient  time  for  an  exchange  of  excitons  to  establish  a 
pairing  interaction  even  though  the  proton  velocity  is  not  well  defined  over  distances 
greater  than  one  hop  in  normal  water. 

Proton-Proton  Correlations  Between  Hopping  Events.  In  order  for  the  proton- 
proton  interaction  to  be  significant  the  protons  must  remain  paired  during  the 
10“ 12  interval  in  the  nonhopping  state.  Water  is  especially  well  suited  to  this  due  to 
the  range  of  hydrogen  bond  distortions  that  are  possible.  Protons  will  remain  paired 
during  the  nonhopping  interval  if  the  exchange  of  excitons  during  the  hopping  inter¬ 
val  serves  to  correlate  these  distortional  displacements.  Thus  when  a  hopping  proton 
imparts  energy  to  a  membrane  electron  and  this  attracts  another  hopping  proton,  a 
correlation  is  established  that  will  persist  as  a  distortional  correlation  for  a  short 
amount  of  time  after  the  hopping  is  completed.  In  effect,  the  initial  distortional  state 
of  the  H+  bonds  are  correlated  in  the  nonhopping  state  by  virtue  of  the  exciton  ex¬ 
change  during  the  immediately  preceding  hopping  state.  When  the  protons  hop  away 
from  the  oxygen  atoms  to  which  they  have  become  associated  there  is  an  increased 
likelihood  that  they  will  do  so  with  some  correlation.  This  will  be  the  case  as  long  as 
the  relaxation  time  of  the  distortional  correlation  exceeds  lO"'2  s.  This  condition 
would  be  less  stringent  at  the  water-membrane  interface  to  the  extent  that  the  pres¬ 
ence  of  highly  specific  molecular  structures  and  the  mixing  of  aqueous  and  non- 
aqueous  hydrogen  bonds  reduce  the  time  between  hops  and  increase  the  distortional 
correlation  time. 

The  mean  free  path  of  an  electron  in  a  metal  is  typically  10’  A  or  more.  This  is 
obviously  very  much  larger  than  the  mean  free  path  of  a  proton  in  water  if  this  is 
identified  with  the  distance  of  a  hop.  However,  the  mean  free  path  should  be  identi¬ 
fied  with  the  distance  over  which  a  correlation  can  be  maintained.  In  normal  water 
correlations  created  during  the  hopping  interval  would  have  to  be  glued  together  by 
residual  distortional  correlations  during  the  nonhopping  interval.  Later  we  will  see 
that  the  proton  degeneracy  required  for  condensation  would  substantially  reduce  the 
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irregular  character  of  hopping  and  thereby  simplify  the  problem  of  maintaining  corre¬ 
lations.  If  condensation  occurs  correlations  would  be  frozen  in  by  the  requirement 
that  most  pairs  have  zero  center  of  mass  momentum. 

Proton-Exciton  Coupling.  The  coupling  of  excitons  and  protons,  symbolized  by 
the  matrix  element  Mq ,  is  favored  by  proximity  since  the  effect  of  an  induced  charge 
falls  off  rapidly  with  distance.  This  condition  is  met  by  the  water- membrane  interface 
since  the  protruding  polar  side  groups  are  immediately  adjacent  to  and  interleaved  by 
the  inner  network  of  hydrogen  bonds,  the  electron  density  reaching  its  maximum  at 
most  9.0  A  from  the  exterior  of  the  membrane  [see  Ref.  28].  Due  to  its  nonpolar 
character  the  interior  of  the  membrane  is  not  a  good  conductor  of  electricity.  Gener¬ 
ally,  poor  conductors  make  superior  superconductors  since  the  particle-hole  interac¬ 
tion  is  stronger.  The  dielectric  character  of  the  interior  suggests  that  membrane 
electrons  attracted  toward  the  aqueous  layer  by  passing  protons  will  be  subject  to  a 
strong  restorative  force,  yielding  a  significant  particle-hole  interaction.  Hydration 
layers  more  than  one  or  two  molecular  diameters  from  the  surface  of  the  membrane 
lose  these  advantages;  however,  protruding  proteins  and  carbohydrates  could  serve  to 
carry  the  interaction  beyond  these  layers,  conferring  the  important  element  of  three 
dimensionality. 

For  the  exciton-proton  interaction  to  occur  beyond  the  innermost  layers  it  would 
have  to  be  transmitted  by  the  water  structure.  It  might  be  supposed  that  oxygen  va¬ 
lence  electrons  serve  as  a  substrate  for  tightly  bound  excitons  (similar  to  Frenkel  ex¬ 
citons).  However,  this  would  blur  the  chemical  distinctness  of  the  proton  and  exciton 
systems.  For  the  purposes  of  the  discussion  that  follows  we  will  therefore  make  the 
more  conservative  assumption  that  the  exciton  interaction  between  protons  is  medi¬ 
ated  completely  by  membrane  components  and  by  organic  moieties  attached  to  the 
membrane. 

Spectroscopic  and  Bond  Energy  Constraints.  We  are  now  in  a  position  to  con¬ 
sider  constraints  imposed  on  the  exciton  energy  by  membrane  spectroscopy  and 
membrane  molecular  organization.  The  following  considerations  suggest  that  the 
“exciton”  is  a  transient  alteration  in  the  ground  state  energy  of  polarizable  membrane 
electrons  with  an  upper  limit  of  about  .2  eV.  Such  transient  ground  state  shifts  do  not 
correspond  to  optical  excitations,  but  they  should  be  observable  as  a  blurring  of  the 
UV  spectral  lines. 

1.  Infeasibility  of  optical  excitations .  In  the  absence  of  pigments  such  as  chloro¬ 
phyll  or  rhodopsin,  membranes  appear  colorless.  Oxidation  produces  some  color  and 
samples  consisting  of  a  large  number  of  membrane  layers  are  said  to  nave  a  yellowish 
color,  possibly  due  to  carotene.  The  lipid  components  of  the  membrane  absorb  at  less 
than  200  )im,  corresponding  to  electronic  excitations  of  about  6  eV.  Unpigmented 
proteins  and  carbohydrates  on  the  membrane  increase  this  to  400  pm  at  most,  de¬ 
creasing  the  excitation  energy  to  about  4  eV.  These  values  correspond  to  covalent 
bond  strengths  (ca.  2  to  ca.  6  eV)  and  are  therefore  too  high  to  mediate  a  pairing  in¬ 
teraction,  both  from  the  standpoint  of  over-over  screening  and  from  the  more  critical 
standpoint  of  the  integrity  of  the  membrane- water  lattice.  Membrane  components  and 
water  absorb  in  the  infrared,  the  latter  being  opaque  to  the  UV.  These  excitations 
must  also  be  excluded  since  they  correspond  to  vibrational  rather  than  electronic 
modes. 
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Even  if  lateral  propagation  of  excitations  corresponding  to  spectroscopic  levels 
could  occur  it  would  do  so  with  a  low  velocity.  The  time  required  for  such  excitations 
is  ca.  10”8  s.  Assuming  that  neighboring  phospholipid  side  groups  are  separated  on 
the  average  by  the  molecular  diameter  of  water  (since  H  '  can  extend  into  the  inter¬ 
stices  of  these  groups)  we  get  a  propagation  velocity  of  at  most  3  cm/s,  as  compared 
to  a  typical  phonon  velocity  of  about  5  x  105  cm/s.  Yet  a  high  propagation  velocity 
is  especially  important  for  water-membrane  superfluidity  since  the  H  ‘  bonds  move 
so  much  more  slowly  than  metallic  electrons  and  since  screening  of  the  Coulomb 
interaction  is  not  quite  as  strong.  This  argument  does  not  apply  to  photons  whose 
energy  does  not  equal  a  difference  in  the  energy  levels  of  the  material;  such  photons 
are  reradiated  with  a  change  in  phase  in  a  time  on  the  order  of  10' 15  s.  giving  an  exci- 
ton  velocity  that  could  be  as  high  as  107  cm/s. 

2.  Ground  state  energy  shifts.  The  above  arguments  exclude  spectroscopically  ob¬ 
servable  excitations  from  a  fixed  ground  state  as  carriers  of  the  pairing  energy.  How¬ 
ever,  these  arguments  do  not  exclude  transient  shifts  in  the  ground  state  energy  of 
membrane  electrons  associated  with  a  time  dependent  potential.  The  time  dependence 
is  initially  due  to  the  hopping  of  an  H  +  bond  into  the  vicinity  of  a  membrane  elec¬ 
tron.  This  can  be  viewed  as  transiently  altering  the  potential  in  which  the  electron 
moves.  This  initial  electronic  motion  alters  the  potential  of  electrons  in  neighboring 
membrane  moieties,  leading  to  the  propagation  of  the  exciton  along  the  membrane. 
Such  shifts  in  ground  state  energy  occur  when  salt  is  added  to  the  bathing  solution 

[33] .  Typically  the  UV  spectrum  of  a  molecule  shifts  by  about  5  /xm  in  the  presence 
of  salt.  This  is  an  average  effect.  The  shifts  due  to  an  interaction  with  an  individual 
H  +  would  be  much  larger,  but  as  fluctuations  would  only  be  observable  as  a  slight 
blurring  of  the  UV  spectral  lines. 

3.  Bond  energy  and  thermal  constraints .  It  is  reasonable  to  demand  that  the  en¬ 
ergy  shift  be  less  than  covalent  bond  energies  and  less  than  weak  bond  energies  sig¬ 
nificant  for  the  integrity  of  the  membrane-water  lattice.  Hydrogen  bond  energies 
range  from  ca.  .09  to  ca.  .4  eV,  with  the  crucial  aqueous  H  *  bonds  being  ca.  .2  eV. 
These  bonds  will  re-form  if  disrupted  by  an  exciton,  but  too  much  disruption  would 
bum  the  overall  proton  wire  structure  that  we  are  assuming.  Thus  it  is  reasonable  to 
make  the  tentative  assumption  that  the  excitation  energy  should  be  less  than 
ca.  .2  eV.  The  hydrogen  bond  energies  and  larger  electrostatic  energies  dominate  in 
the  polar  region  of  the  membrane,  whereas  van  der  Waal’s  forces  (  .04  eV/particle  for 
CH3  and  .08  for  CH2)  and  hydrophobic  interactions  dominate  in  the  nonpolar  interior 

(34) .  Van  der  Waal’s  interactions  and  other  weak  interactions  are  also  critically  im¬ 
portant  for  the  positioning  of  proteins  in  and  on  the  membrane.  But  it  is  not  necessary 
to  cut  the  exciton  energy  off  at  the  energy  of  van  der  Waal’s  interactions,  first  be¬ 
cause  the  energy  here  is  highly  geometry  dependent  (falling  off  as  r  ”6),  second  be¬ 
cause  large  numbers  of  weak  interactions  can  add  up  to  an  interaction  significant 
relative  to  kT,  and  third  because  the  geometry-based  specificity  of  these  interactions 
allows  the  membrane  to  self-assemble  into  its  original  form  subsequent  to  any  disrup¬ 
tion  that  would  be  caused  by  the  exciton. 

Life  processes  generally  break  down  at  3i3°  K.  three  degrees  above  body  tempera¬ 
ture.  However,  some  bacteria  live  in  hot  springs  at  353°  K,  only  20°  K  below  the 
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boiling  point  of  water.  The  corresponding  thermal  energy  is  .03  eV  (as  compared  to 
.027  eV  at  body  temperature).  Exciton  energies  usually  have  a  lower  limit  of 
ca.  .  1  eV  (21),  which  stands  well  above  the  thermal  energy  at  the  maximum  tempera¬ 
ture  of  life. 


Feasibility  of  a  Bose  Condensation 

Whether  proton  pairs  will  condense  in  momentum  space  depends  on  the  effective 
mass,  the  density  of  states  that  can  contribute  to  the  condensation  process,  the  sta¬ 
bility  of  the  water-membrane  lattice  to  the  coupling  of  protons  and  excitons,  and  the 
operation  of  the  exclusion  principle  in  the  condensation  process. 

Effective  Mass  and  Proton  Degeneration 

The  Bose-Einstein  distribution  passes  over  into  the  Maxwell-Boltzmann  distribu¬ 
tion  if  the  mean  de  Broglie  wavelength  of  the  particles  is  small  compared  to  the  mean 
distance  between  them.  The  condition  for  strong  degeneracy  is  met  if 

(2TrmkhT)'2 

" - —  (4» 

where  n  is  particle  density,  m  is  particle  mass,  and  the  expression  on  the  right-hand 
side  may  be  identified  with  the  thermal  de  Broglie  wavelength  of  particles  in  an  ideal 
gas  [35, 36).  Taking  m  as  the  proton  rest  mass  (mpr  =  1 .67  x  10'24  g)  and  T as  body 
temperature  (310°  K)  gives  a  value  of  n  —  1024  protons  per  cubic  centimeter,  which 
means  that  the  distance  between  protons  would  have  to  be  about  1  A  (assuming  each 
proton  to  be  at  the  center  of  a  small  cube).  Normal  water  contains  about  .066  x  1024 
H  atoms/crn\  corresponding  to  a  mean  separation  of  about  2.5  A.  The  shortest 
hydrogen  atom  separation  in  water  structure  is  about  1.4  A  on  the  average,  assuming 
the  tetrahedral  coordination  of  water  molecules  in  ice.  The  situation  is  comparable 
in  proposed  proton  wires,  such  as  hydrogen-bonded  chains  formed  from  the  side 
chains  of  protein;,  and  bound  water  (19).  Hydrogen  atom  separations  would  be  too 
large  for  strong  degeneration  to  occur  if  hydrogen  bond  lengths  in  the  range  of  2.5- 
3.5  A  are  assumed. 

The  de  Broglie  wavelength  of  a  proton  moving  in  a  complicated  potential  of  the 
type  that  would  occur  ai  the  water-membrane  interface  will  in  general  be  different 
than  the  free  proton  wavelength.  To  take  this  into  account  let  us  replace  m  in  Eq.  (4) 
by  an  effective  mass  -  (d2E/dp2Y'  of  the  type  that  occurs  in  the  simplest  band 
theory  of  metals.  The  effective  mass  of  the  proton  w'ould  be  increased  if  it  were 
tightly  bound  to  an  oxygen  atom  since  in  that  case  an  excitation  would  be  associated 
with  a  smaller  than  expected  change  in  momentum  (or.  alternatively,  a  force  excited 
on  it  would  produce  a  smaller  than  expected  acceleration).  If  the  proton  is  only 
loosely  bound  to  the  oxygen  atom,  the  effective  mass  would  be  smaller  since  a  small 
applied  force  would  have  a  larger  effect  on  the  momentum  than  would  be  expected. 

Let  us  consider  what  m  *  would  have  to  .be  in  order  to  satisfy  Eq.  (4),  it  being  rec¬ 
ognized  that  in  reality  the  effective  mass  would  be  a  tensor  quantity  and  that  its  use 
here  is  purely  conceptual  given  the  fact  that  the  acceleration  would  probably  be  a 
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very  nonlinear  function  of  the  applied  force.  For  a  hydrogen  atom  separation  of 
d  =  2.5  A,  m*  must  be  about  mp/l.  For  d  =  2.8  A,  m*  ~  mpr/  8.  These  estimates 
assume  that  each  H+  is  at  the  center  of  a  small  cube,  as  would  be  the  case  in  a  mixed 
water-ice  structure  in  which  water  molecules  of  average  diameter  2.8  A  are  loosely 
packed  as  spheres  with  six  nearest  neighbors  on  the  average.  Pure  ice  is  loosely 
packed,  with  each  oxygen  packed  as  a  2.76  A  diameter  sphere  with  four  nearest 
neighbors.  The  diameter  increases  slightly  in  water.  But  melting  of  the  bonds  holding 
the  ice  crystal  together  allows  for  closer  packing,  yielding  the  well  known  fact  that 
water  has  a  higher  density  than  ice  [24].  Extreme  close  packing,  if  it  could  occur, 
would  approximately  double  the  density,  reducing  the  mean  separation  to  ca.  2  A, 
thus  reducing  the  required  value  of  m  *  to  about  mpr/ 4. 

The  hypothesis  on  which  the  remainder  of  the  paper  will  be  based  is  that  the  inter¬ 
action  of  water  molecules  with  the  membrane  surface  distorts  it  enough  so  as  to  de¬ 
generate  the  H+  atoms  in  small  domains  on  this  surface.  The  concentration  of  H+ 
should  be  increased  by  distortion,  since  weakening  of  bonds  would  allow  for  closer 
packing.  The  concentration  should  also  be  increased  on  the  inside  surface  of  the  cell 
membrane  by  fixed  negative  charges  on  proteins  and  by  the  accumulation  of  negative 
ions  on  the  membrane.  Distortion  of  the  water  structure  resulting  from  interleaving 
with  the  polar  groups  of  the  membrane  surface  could  reduce  the  activation  barriers  to 
H+  migration,  thereby  decreasing  the  effective  mass.  Any  structure  on  the  membrane 
that  catalyzes  the  flow  of  protons  would  serve  to  decrease  effective  mass  since  it 
would  allow  for  small  fluctuations  in  energy  to  have  a  greater  effect  on  the  proton 
momentum  than  would  be  the  case  for  a  free  proton.  The  effective  mass  would  thus 
be  further  reduced  by  the  integration  of  channel  and  wire-like  proteins  into  the  al¬ 
ready  facilitated  hydrogen-bonded  chains  at  the  water-membrane  interface.  We  will 
picture  the  membrane  as  a  network  of  degenerated  patches  that  comprise  a  small  frac¬ 
tion  of  its  surface  but  which  are  tied  together  by  connecting  channels  and  tunnelling 
processes.  This  is  consistent  with  the  borderline  character  of  proton  degeneration  and 
with  the  fact  that  degeneration  of  all  the  protons  on  the  membrane  would  interfere 
with  normal  chemistry.  The  time  between  hops  should  be  shorter  for  protons  with  a 
lower  effective  mass,  and  as  a  consequence  the  pairing  interaction  should  become 
stronger  under  conditions  that  favor  degeneration.  But  more  to  the  point,  the  delocali¬ 
zation  connected  with  degeneration  is  incompatible  with  a  simple  hopping  picture. 

Transition  Temperature,  Lattice  Stability,  and  Density  of  States 

If  these  patches  and  connecting  channels  were  strictly  analogous  to  a  weak  cou¬ 
pling  superconductor  we  could  write 

T  =  (5) 

kb 

where  Tc  is  the  critical  temperature  and  Nip.)  will  be  identified  with  the  density  of 
states  available  for  contributing  to  condensation  of  protons  of  a  given  spin.  Weak 
coupling  means  that  N(p)V  =  Nip)  (V„  +  VCou,)  <S  1,  where  V  =  |Vj.  In  the  metal 
case  Nip)  is  the  density  of  available  states  as  a  function  of  kinetic  energy  of  electron 
pairs  measured  from  the  Fermi  level,  but  approximated  by  the  density  of  single  spin 
states  at  the  Fermi  level  at  T =0°K.  In  the  proton  case  we  will  take  Nip)  to  be  the 


WATER-MEMBRANE  INTERFACE 


179 


density  of  available  states  as  a  function  of  the  kinetic  energy  of  the  proton  pairs,  but 
approximated  by  the  density  of  single  spin  states  at  the  chemical  potential  of  the  low¬ 
est  energy  unpaired  state  of  mobile  hydrogen  bonds  in  the  membrane-water  system 
under  consideration.  As  increases  so  does  the  number  of  transitions  between 
momentum  states  (p, ,  p2)  and  (pj,p2)  of  two  protons  that  conserve  total  momentum 
and  that  are  therefore  compatible  with  a  phase  transition.  As  in  the  metal  case,  the 
maximum  occurs  when  the  members  of  a  pair  have  equal  but  opposite  momenta. 

According  to  Eq.  (3)  the  proton  energy  kwpr  as  measured  from  the  chemical  poten¬ 
tial  fj.  (actually  the  proton  energies  before  and  after  scattering)  must  be  less  than 
ho>a.  The  energy  huta  thus  enters  Eq.  (5)  as  a  cutoff  of  the  range  of  pairing  energies 
under  the  simplifying  assumption  that  the  pairing  energy  is  essentially  constant 
whenever  this  cutoff  is  satisfied.  The  term  can  be  thought  of  as  the  width  of  a 
potential  well  in  which  the  protons  are  interacting,  whereas  N  (ja)V  can  be  thought  of 
as  its  depth. 

Some  models  of  metallic  superconductivity  set  the  upper  limit  of  N  (p.)V  as  1  /2 
[see  for  example  Ref.  21].  The  existence  of  an  upper  limit  is  due  to  the  fact  that  too 
strong  a  coupling  between  electrons  and  phonons  destabilizes  the  lattice,  thereby 
leading  to  lower  phonon  frequencies.  The  situation  is  different  in  the  membrane- 
water  case  in  that  the  coupling  between  protons  and  excitons  occurs  in  hydrogen 
bond  networks  that  are  distinct  from  the  membrane  structures  in  which  the  electronic 
oscillations  occur.  Furthermore  the  membrane  structures  are  comparatively  massive 
and  the  electrons  that  mediate  the  excitonic  interaction  are  bound  with  covalent  bond 
strengths.  As  a  consequence  there  is  no  reason  to  suspect  that  A(p.)V  is  subject  to 
restrictions  more  stringent  or  even  as  stringent  as  in  a  metal. 

For  the  moment  accepting  Eq.  (5)  as  indicative  of  the  constraints  on  and 
N{fi)V  and  taking  Tc  as  body  temperature,  we  obtain  the  numbers  in  Table  1.  If  hu>ex 
is  set  at  2  eV,  the  maximum  compatible  with  covalent  bonding,  we  can  get  by  with  a 
fairly  small  value  of  N(fi)V;  but  this  case  is  incompatible  with  our  assumption  that 
should  not  exceed  hydrogen  bond  strengths.  If  N(fi)V  is  set  equal  to  1  (violating 
the  weak  coupling  condition)  hoiex  ~  .06  eV,  which  is  lower  than  necessary  in  com¬ 
parison  to  hydrogen  bond  strengths  and  likely  to  be  masked  by  vibratory  activity  at 
the  border  of  the  far  infrared.  If  N ({i)V  is  set  equal  to  1/2,  a  possible  value  in  metal¬ 
lic  models,  we  get  values  of  h<a„  that  are  slightly  less  than  aqueous  hydrogen  bond 
energies,  corresponding  to  a  transient  shift  in  the  ground  state  energy  of  membrane 
electrons  of  ca.  1.4  x  103  cm-1  to  ca.  1.5  x  103  cm"’.  These  are  physically  reason- 

Table  I.  Exciton  energies  that  would  cor¬ 
respond  to  different  coupling  parameters  if 
Eq.  (5)  were  valid  for  the  water-membrane 
interface  (parenthesized  value  is  at  350°  K). 


N(n)V 

ftw„  (ev) 

MA) 

.23 

2.00 

ca.  6  x  10' 

.50 

.17  (.19) 

ca.  7  x  10‘ 

.68 

.10 

ca.  I  x  105 

1.00 

.06 

ca.  2  x  105 
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able  exciton  energies  in  that  they  satisfy  the  bond  energy  and  thermal  constraints  dis¬ 
cussed  earlier.  It  is  logically  possible  that  the  membrane-water  lattice  would  melt  at  a 
temperature  that  would  otherwise  be  below  the  critical  temperature.  However,  to  the 
extent  that  any  higher  value  of  Tc  would  increase  hcorJI  we  can  conclude  that  N(p.)V 
cannot  be  much  less  than  1/2  if  Eq.  (4)  is,  in  fact,  applicable. 

Let  us  now  compare  the  values  of  N  ( fi)V  for  conduction  electrons  to  those  that 
might  be  achieved  by  H+  bonds.  We  will  concentrate  on  N(fi)  since  there  is  no  rea¬ 
son  to  suspect  unfavorable  limitations  on  V  given  the  fact  that  membrane  proteins  can 
be  tailored  for  highly  specific  interactions.  The  conduction  electrons  in  a  metal  move 
in  a  periodic  potential  even  though  they  may  be  treated  in  a  free  electron  model  as  an 
ideal  Fermion  gas  with  the  energy  of  each  particle  given  by  E  =  p2/2m.  The  mobile 
hydrogen  bonds  at  the  water-membrane  interface  also  move  in  a  spatially  varying 
potential,  determined  by  the  positions  of  the  oxygen  atoms.  The  difference  is  that  the 
translational  symmetry  of  the  H+  momentum  is  badly  broken  by  the  ca.  .2  eV  inter¬ 
action  with  oxygen  and  there  are  some  mutual  interactions  among  the  hydrogen 
bonds  that  facilitate  their  motion.  For  the  purposes  of  a  rough  comparison  we  will 
nevertheless  begin  with  the  overidealization  that  the  H+  bonds  can  be  treated  as  free 
Fermi  particles,  later  modifying  the  picture  to  ascertain  the  qualitative  effect  of  the 
H+ — 0~  and  the  local  H+,  H+  interactions. 

The  density  of  single  Fermion  states  of  one  spin  orientation  per  unit  volume  is 
given  by 


<6> 

where  the  energy  of  the  particle  ranges  from  E  Vo  E  +dE  and  E  =  ap2/2m  [37],  For 
free  particles  a  =  1  and  for  particles  subject  to  a  potential  a  may  be  defined  as 
md2E/dp 2  =  m/m*.  The  energy  of  interest  in  metals  is  the  chemical  potential  of  the 
electrons  (i.e.,  the  Fermi  energy,  Ef).  Since  this  is  high  relative  to  room  temperature 
(ca.  10  eV  for  a  typical  metallic  superconductor),  N(Ef)  is  essentially  equal  to  its  zero 
temperature  value  and  is  essentially  constant  over  the  region  of  interest.  If  for  the 
moment  we  assume  that  the  chemical  potential  of  the  mobile  H 4  is  not  too  different  if 
measured  from  the  highest  or  lowest  energy  nonhopping  state,  we  can  identify  it  with 
the  translational  kinetic  energy  used  in  the  proton  wire  argument  presented  earlier. 
This  gives  a  value  of  Epr  ~  4  x  10~4  eV,  corresponding  to  the  velocity 
vpr  —  2.8  x  104  cm/s.  Setting  a  =  1  gives  the  ratio  N(p.pr)/N(pel)  ~  (m^r/mH),: 
x  (Epr/Ef)'12  ~  500.  Idealizing  the  protons  as  free  particles  evidently  leads  to  an 
increase  in  the  density  of  states  as  compared  to  free  electrons,  since  the  increase  in 
mass  dominates  the  decrease  in  kinetic  energy.  If  a  =  4,  corresponding  to  an  effec¬ 
tive  mass  that  could  allow  for  degeneration,  and  vpr  increased  accordingly  (as  dis¬ 
cussed  in  the  next  section),  the  density  of  proton  states  would  still  have  an 
approximately  350-fold  advantage. 

The  assumption  that  Epr  is  equal  to  the  kinetic  energy  is  equivalent  to  burying  the 
highest  and  lowest  energy  of  the  unbroken  hydrogen  bond  in  the  potential  in  which 
H  +  is  traveling.  This  corresponds  to  the  fact  that  the  kinetic  energy  of  an  H  +  in  the 
field  of  an  oxygen  is  reduced  as  compared  to  that  of  a  free  proton  of  the  same  total 
energy.  This  would  appear  to  increase  m*  as  compared  to  mpr,  which  would  mean  an 
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increase  in  the  density  of  states  at  the  expense  of  the  required  degeneracy  of  the  pro¬ 
tons.  This  would  be  the  case  in  normal  water.  In  terms  of  the  naive  analogy  to  the 
band  picture  the  attraction  to  O'  can  be  thought  of  as  compressing  the  H+  states  into 
a  narrower  band,  so  that  in  normal  water  a  comparatively  long  residence  time  is  natu¬ 
rally  associated  with  an  increase  in  the  density  of  states.  However,  our  assumption  is 
that  the  activation  barrier  for  proton  hopping  is  reduced  at  the  water-membrane  inter¬ 
face  due  to  distortion  of  the  water  structure  and  mixing  of  aqueous  and  nonaqueous 
hydrogen  bonds.  This  would  expand  the  H  +  states  into  a  wider  band,  leading  to  the 
decrease  in  the  density  of  states.  The  fluctuation  energies  required  to  put  the  proton 
into  this  band — hence  to  put  it  in  a  position  to  be  captured  by  a  neighboring  oxy¬ 
gen —  should  be  small  enough  to  effectively  delocalize  the  proton.  The  local  H  +  ,  H* 
interaction,  insofar  as  it  facilitates  the  jump  to  a  new  O  '  ,  would  also  have  a  decreas¬ 
ing  effect  on  the  effective  mass  and  on  the  density  of  states. 

Three  further  points  bear  on  the  applicability  of  Eq.  (5).  The  first  is  that  this  equa¬ 
tion  results  from  an  integration  that  assumes  the  constancy  of  7V(/x).  This  assumption 
cannot  be  correct  for  the  membrane-water  lattice  due  to  the  high  specific  heat  of 
water.  Thus  the  configurational  component  of  the  thermal  energy  of  pure  liquid  water 
increases  at  a  constant  slope  of  ca.  .3  x  10'2  eV/H  *  per  degree  Kelvin  over  the 
whole  range  from  the  freezing  to  the  boiling  point  of  water  [see  Fig.  4.12  in 
Ref.  30J.  This  alters  the  form  of  Eq.  (4),  but  not  the  inverse  exponential  character  of 
the  dependence  of  Tc  on  N(p)V.  Furthermore,  the  specific  heat  would  probably  be 
decreased  in  degenerated  regions.  The  second  objection  is  that  we  have  not  consid¬ 
ered  the  potential  distortional  correlations  of  H+  bonds.  These  should  be  of  minor  im¬ 
portance  in  the  case  of  degenerated  protons,  but  in  any  case  would  increase  the 
density  of  states  in  the  energy  range  of  these  distortions.  The  third  point  is  connected 
with  the  concentration  of  mobile  protons.  The  concentration  of  conduction  electrons 
in  a  superconducting  metal  is  typically  1022  cm  ',  not  much  larger  than  the  density  of 
states  at  the  Fermi  surface  (given  by  N(prl)  —  10*1  eV''cm  J).  The  concentration  of 
protons  mobile  at  any  given  time  in  normal  water  is  about  10^-fold  smaller.  The 
proper  comparison,  however,  should  be  to  the  concentration  of  degenerated  protons 
in  local  patches  and  channels.  This  is  about  1025  H+/cm5,  an  order  of  magnitude 
larger  than  the  conduction  electron  concentration. 

If  we  drop  the  weak  coupling  assumption  and  still  wish  to  adhere  to  the  metal 
analogy,  Eq.  (5)  can  be  replaced  by  phenomenologically  more  flexible  formulae  of 
the  McMillan  type  [22,38].  The  main  alteration  is  that  X  =  (V(/z)V„  would  be  re¬ 
placed  by  the  renormalized  interaction  parameter  X/(l  +  X)  (with  X  redefined).  The 
Coulomb  repulsion  is  reduced  in  the  case  of  the  phonon-electron  interaction,  with  the 
degree  of  reduction  decreasing  with  increasing  phonon  frequency.  This  effect  should 
be  smaller  in  the  case  of  the  exciton-proton  interaction  due  to  the  larger  mass  of  the 
proton.  The  important  point,  for  the  present  purposes,  is  that  the  qualitative  depen¬ 
dence  of  Tc  on  N(fi)V  remains  the  same.  However,  the  differences  between  metal  and 
water  are  undoubtedly  sufficient  to  disturb  the  corresponding  state  aspect  of  critical 
temperature  equations  that  have  been  developed  for  metals. 

Role  of  the  Exclusion  Principle 

The  thermal  velocity  of  a  classical  gas  with  particles  of  proton  mass  is  vcpr  = 
(3/C(,r fmpr)v2  ~  2.8  x  105  cm/s.  This  is  an  order  of  magnitude  larger  than  the  veloc¬ 
ities  estimated  in  the  discussion  of  Proton  Motions  in  Normal  Water  and,  therefore. 
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consistent  with  the  expected  velocity-decreasing  effect  of  the  interaction  with  O  . 
The  velocity  of  protons  treated  as  an  ideal  Fermi  gas  at  0°K  would  be  (h/mpr) 
(37tV/3  ~  .76  X  102  cm/s  if  n  is  taken  as  the  concentration  of  mobile  protons  in 
pure  water  at  liquid  phase  temperatures.  Thus  if  mobile  H  +  bonds  could  be  viewed 
as  an  ideal  Fermi  gas  we  would  have  an  inversion  of  the  metal  case,  the  thermal 
velocity  of  free  electrons  in  a  metal  being  an  order  of  magnitude  less  than  the  zero 
temperature  Fermi  velocity.  This  is  why  the  hopping  motion  of  protons  in  normal 
water  must  be  a  fluctuation  phenomenon  rather  than  a  consequence  of  their  being 
forced  into  a  high  minimum  energy  state  by  the  Pauli  exclusion  principle,  as  in  the 
case  for  free  electrons  in  a  metal.  The  high  specific  heat  of  water  has  its  origin  in  this 
difference,  that  is,  in  the  fact  that  it  is  the  distortion  and  breakage  of  hydrogen  bonds 
that  allows  the  configurational  contribution  to  the  internal  energy  of  water  to  be 
somewhat  more  than  double  the  vibrational  contribution  [30]. 

Degeneration  of  the  protons  would  alter  this  picture  substantially.  Suppose,  again 
as  a  very  rough  approximation,  that  the  protons  in  degenerated  patches  on  the  mem¬ 
brane  could  be  treated  as  an  ideal  Fermi  gas  so  far  as  their  velocity  is  concerned.  The 
zero  temperature  Fermi  velocity  would  then  be  ca.  3.2  x  105  cm/s,  where  n  is  taken 
as  the  total  concentration  of  H  atoms  under  the  assumption  of  tight  packing  and  mpr  is 
replaced  by  m*  =  mpr/ 4.  This  is  the  same  order  of  magnitude  as  the  classical  ther¬ 
mal  velocity  (which  would  increase  to  ca.  5.6  x  105  cm/s  if  we  use  the  effective 
mass).  Though  these  estimates  are  unreliable,  they  do  indicate  that  the  flow  of  pro¬ 
tons  would  be  much  more  influenced  by  the  exclusion  principle  under  our  degenerate 
patch  assumption  than  would  be  the  case  in  normal  water.  The  flow  should  become 
much  less  irregular,  reducing  the  problem  of  maintaining  proton-proton  correlations 
between  hopping  events.  This  is  why  the  specific  heat  of  “superwater”  patches  and 
channels  should  be  decreased  as  compared  to  normal  water.  The  distinction  between 
mobile  and  nonmobile  proton  pools  would  also  become  blurred.  Since  protons  are 
identical  particles  it  is,  of  course,  operationally  impossible  to  determine  which  pool  a 
particular  proton  belongs  to  in  any  case. 

The  exclusion  principle  plays  a  related  role  in  the  water-membrane  model  which  is 
important  and  analogous  to  its  role  in  the  metal  case.  Each  electron  in  the  metal  must 
be  in  a  distinct  single  electron  level  state.  This  is  true  both  before  and  after  pairing. 
However,  when  pairs  form  it  also  becomes  necessary  to  occupy  the  pair  levels.  Since 
ergy  of  electrons  due  to  phonon  exchange  is  less  than  the  decrease  in  potential  energy 
due  to  pairing.  The  Fermi  level  is  replaced  by  the  chemical  potential  of  a  single  pool 
of  movable  hydrogen  bonds  in  the  water-membrane  case  and  it  is  this  level  that  is 
unstable  to  pairing.  This  pool  should  be  equivalent  to  the  pool  of  exclusion  principle 
supported  conduction  electrons  so  far  as  the  stability  of  its  ground  energy  level  to  a 
pairing  interaction  is  concerned,  as  long  as  the  increase  in  kinetic  energy  of  the  pro¬ 
tons  due  to  exciton  exchange  is  less  than  the  decrease  in  potential  energy  due  to 
pairing. 

The  exclusion  principle  plays  a  related  role  in  the  water-membrane  model  which  is 
important  and  analogous  to  its  role  in  the  metal  case.  Each  electron  in  the  metal  must 
be  in  a  distinct  single  electron  level  state.  This  is  true  both  before  and  after  pairing. 
However,  when  pairs  form  it  also  becomes  necessary  to  occupy  the  pair  levels.  Since 
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separated  by  a  water  molecular  diameter.  We  recall,  from  our  discussion  in  stereo- 
the  pairs  have  integral  spin,  multiple  occupancy  is  possible  and  the  pairs  can  con¬ 
dense  into  the  lowest  energy  pair  level.  Similarly,  if  proton  pairs  form  they  must  oc¬ 
cupy  pair  levels  distinct  from  the  levels  occupied  by  the  individual  protons.  Since  the 
pairs  are  bosons  they  would  tend  to  condense  into  the  lowest  lying  pair  state.  As  pairs 
they  have  to  respect  conflicts  arising  from  the  Coulomb  interaction,  but  not  conflicts 
arising  from  the  exclusion  principle. 

As  with  electronic  superconductors,  proton  superflows  can  be  thought  of  in  terms 
of  a  two-fluid  model  in  which  a  condensate  of  pairs  is  the  true  ground  state  and  the 
normal  or  unpaired  protons  are  excitations  of  this  ground  state.  Above  the  critical 
temperature  all  the  mobile  H  +  bonds  would  exist  in  the  excited  states,  if  in  fact  a 
critical  temperature  exists  below  the  melting  point  of  the  water-membrane  lattice. 

The  singlet  state  is  favored  in  the  case  of  metal  electrons  due  to  the  stabilizing  ef¬ 
fect  of  exchange  interactions.  Such  exchange  effects  would  occur  in  the  bound  water 
case,  but  in  much  smaller  degree  due  to  the  larger  mass  of  the  proton.  Thus  the  occur¬ 
rence  of  triplet-state  proton  pairs  cannot  be  excluded.  The  coexistence  of  singlet  and 
doublet  pairs  would  yield  a  condensed  state  with  enriched  hydrodynamic  behavior. 

Pair  Separation 

The  disanalogies  between  bound  water  and  metal  suggest  that  the  condensed  state 
could  differ  substantially  in  a  number  of  respects.  The  key  feature  is  the  spatial  range 
of  the  pair  wave  function.  This  is  large  in  a  metal,  typically  ca.  103  A.  The  primary 
controlling  factor  is  the  electron  velocity,  this  being  ca.  101  times  larger  than  the 
phonon  velocity.  The  electron  velocity  is  restricted  to  a  narrow  range,  due  to  the 
reasonable  requirement  that  the  e“-e“  wavefunction  be  a  superposition  of  single  e“ 
wavefunctions  within  an  energy  range  of  the  Fermi  energy  on  the  order  of  the  energy 
gap  (see  [39]).  As  a  consequence  of  this  restriction  the  spatial  range  and  energy  gap 
are  roughly  connected  by  the  uncertainty  formula  (£., /vfi)At]h,  where  A,,  is  the 
energy  gap,  is  the  spatial  range,  v,,  is  the  Fermi  velocity,  and  (A,,/v,,)  ~  rp,  can 
be  interpreted  as  the  approximate  time  required  for  the  two  electrons  to  interact  with 
a  phonon. 

A  similar  restriction  on  proton  velocities  is  not  implausible  for  water  due  to  the 
high  density  of  states  close  to  the  lowest  energy  unpaired  H+  state.  However,  the  ex- 
citon  velocity  can  be  very  much  higher  than  the  phonon  velocity  and  very  much 
higher  than  the  effective  proton  velocity.  As  a  consequence  £pr  is  primarily  controlled 
by  v„  rather  than  vpr ,  destroying  the  simple  relation  to  Apr .  But,  to  the  extent  that  v„ 
is  the  dominant  factor  we  can  take  the  approximate  time  for  the  two  protons  to  inter¬ 
act  with  the  exciton  as  given  by  Tpr  ~  £pr/vpr,  subject  to  the  uncertainty  principle 
constraint  r„  ~  h/a)ex.  Thus,  £pr  ~  hveJ o>„  ~  35  x  10“lf,v„.  If  we  set  \pr  — 
then  v„  must  be  ca.  3  x  109  cm/s,  about  1/10  of  the  speed  of  light.  This  is  quite 
high,  especially  as  it  would  require  the  period  of  the  electronic  oscillation  to  be  ca. 
10“' V.  If  |pr  is  reduced  to  a  value  between  10  and  100  A  we  obtain  the  more  plausi¬ 
ble  oscillation  period  in  the  range  10“ 14- 10' 15  s“'.  Since  values  of  !-pr  much  less  than 
10  A  are  unlikely  due  to  the  Coulomb  repulsion,  we  can  set  a  lower  bound  of  ca. 
3  x  106  cm/s  on  the  exciton  velocity,  assuming  side  groups  on  the  membrane  are 
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separated  by  a  water  molecular  diameter.  We  recall,  from  our  discussion  in  spectro¬ 
scopic  and  Bond  Energy  Constraints,  that  10  'V  is  a  typical  electronic  rearrange¬ 
ment  time,  giving  a  maximum  velocity  of  3  x  I07  cm/s. 

Though  very  approximative,  this  argument  does  strongly  suggest  that  the  spatial 
extent  of  the  pair  wavefunction  is  at  least  an  order  of  magnitude  smaller  in  the  hound 
water  than  in  the  metal  case.  If  the  exciton  velocities  could  become  extremely  large 
the  pair  separation  would  still  be  limited  by  damping.  As  the  salt  concentration  de¬ 
creases,  the  pair  separation  should  increase  due  to  the  decreased  effect  of  screening. 

Zero  Voltage  Supcrcurrents 

Biological  systems  are  probably  too  complicated  for  observations  involving  mag¬ 
netic  fields,  microwaves,  or  exotic  hydrodynamic  behavior  to  provide  more  than  sug¬ 
gestive  evidence  for  a  proton  superfluid.  However,  it  might  be  possible  to  perform  a 
critical  experiment  due  to  the  fact  that  phase  coherence  of  the  type  that  is  responsible 
for  the  DC  Josephson  effect  is  an  unambiguous  manifestation  of  the  superfluid  state. 
We  assume,  in  analogy  to  the  argument  made  by  Josephson  |40|,  that  the  proton  su¬ 
perfluid  is  characterized  by  an  order  parameter  i p(r)  with  a  definite  phase  that  may  be 
different  on  the  two  sides  of  the  insulating  membrane.  If  a  large  number  of  degener¬ 
ated  channels  (presumably  associated  with  integral  proteins)  cross  the  membrane  it 
should  be  possible  for  the  long-range  order  of  the  phase  to  cross  as  well.  According 
to  the  Ginzburg-Landau  equation  for  the  free  energy  density  of  a  superconductor,  the 
free  energy  of  the  channels  should  then  increase  the  free  energy  of  the  system  as  a 
result  of  the  variation  in  the  phase  by  \/2m\-ih\tyf  (omitting,  however,  the  depen¬ 
dence  on  the  magnetic  vector  potential).  If  the  voltage  is  clamped  at  a  zero  potential 
difference,  current  should  flow  to  reduce  this  free  energy  and  would  continue  to  flow 
if  the  number  of  protons  on  each  side  were  maintained  constant  by  being  circuited 
with  the  help  of  a  battery. 

This  effect,  though  definitive,  could  be  extremely  small  due  to  the  fact  that  the 
total  number  of  degenerated  H  '  atoms  on  the  membrane  must  be  small  and  due  to 
the  fact  that  the  number  density  of  paired  charge  carriers,  njr)  =  |i|»(r)|",  might 
comprise  a  small  fraction  of  these.  The  effect  would  also  be  smaller  for  protons  than 
electrons  since  the  free  energy  of  the  channel  current  density  decreases  with  increas¬ 
ing  mass.  Simple  tunneling  could  not  be  the  correct  picture  in  normal  water  since 
protons  hop  across  the  membrane  through  a  sequence  of  tunneling  events  facilitated 
by  the  proton  channel  in  this  case.  When  degeneration  sets  in.  however,  the  analogy 
to  electrons  tunneling  across  a  thin  barrier  should  be  more  applicable. 

A  small  AC  Josephson  effect  could  also  occur.  A  constant  potential  across  a 
Josephson  junction  induces  a  supercurrent  with  frequency  cu,  =  2QV/U,  provided  no 
other  mechanism  of  dissipation  is  available.  V  is  typically  on  the  order  of  millivolts, 
corresponding  to  emitted  radiation  in  the  short  wavelength  microwave  range  [see 
Ref.  41].  The  resting  potential  of  most  cells  is  in  the  range  - 10  to  — 100  mV,  with 
some  cells  going  up  to  -200  mV  [28],  It  is  possible  that  an  AC  type  Josephson  effect 
is  responsible  fc"  the  high  voltage  sensitivity  of  channel  proteins  in  the  membrane 
that  serve  as  Na  1  and  K  4  gates  [see  Ref.  42].  For  nerve  and  muscle  cells  the  rest¬ 
ing  potential  is  typically  -60  to  -70  mV.  The  threshold  for  firing  is  15-20  mV  less 
negative  than  this,  with  Na  4  channel  proteins  beginning  to  respond  when  the  depo- 
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larization  exceeds  7  mV.  It  is  therefore  conceivable  that  the  voltage  sensor  presumed 
to  control  Na  *  and  K  +  gates  responds  to  oj,  rather  than  to  the  field  itself,  it  being 
recognized  that  the  current  is  statistically  less  well  defined  than  in  the  metal  case. 
Alternatively,  in  the  presence  of  a  suitably  adjusted  dissipative  process  the  depolar¬ 
ization  could  induce  a  DC  proton  current  that  reorients  the  gates  to  an  open  position. 
In  the  absence  of  such  an  alternative  mode  of  dissipation  the  current  oscillations 
would  produce  radiation  in  the  short  wavelength  microwave  range,  the  wavelengths 
increasing  as  other  modes  of  dissipation  are  introduced.  This  may  be  connected  to  the 
disorienting  effects  of  microwave  radiation  and  a  functional  role  for  such  membrane 
potential  governed  current  oscillations  should  not  be  excluded.  These  types  of  ef¬ 
fects,  in  which  phase  order  crosses  a  membrane,  would  also  confer  a  high  sensitivity 
to  magnetic  fields. 


Biological  Implications 

The  primary  constituent  of  biological  tissues  —  from  60  to  90fr — is  water  and  a 
significant  fraction  of  this  is  in  bound  form.  The  lipid  bilayer  is  particularly  important 
because  of  its  ubiquity  as  a  cellular  and  intracellular  structure.  However,  the  basic  el¬ 
ements  of  the  proposed  interaction  would  be  present  wherever  bound  water  occurs, 
that  is.  wherever  a  nonpolar  moiety  is  attached  to  a  polar  moiety,  which  in  turn  at¬ 
tracts  a  film  of  water.  This  is  the  case  with  free  proteins,  which  bind  water  in  the 
form  of  a  sphere  of  hydration,  and  in  networks  of  protein  structures  (microtubules 
and  microfilaments)  that  form  the  membrane  and  cellular  cvtoskeletons.  Hydrogen 
bonds  contribute  to  the  conformation  of  nearly  a!)  macromolecules  in  the  cell  and 
contribute  to  the  mechanism  of  action  in  many  cases.  The  pairing  interaction  extends 
with  only  minor  modifications  to  these  nonaqueous  hydrogen  bonds.  Many  of  these 
undoubtedly  exchange  with  mobile  protons  of  the  aqueous  hydrogen  bond  network. 
Yet  a  small  superproton  fraction  in  this  elaborate  network  could  easily  be  hidden  by 
the  fact  that  the  tools  traditionally  used  for  establishing  the  existence  of  a  superfluid 
state  have  complex  functional  effects  in  biological  organisms. 

Standing  above  this  ambiguity,  however,  is  the  manifest  coherence  of  biological 
materials.  The  question  is  whether  this  coherence  is  based  in  part  on  a  physically  co¬ 
herent  substrate,  such  as  a  proton  superfluid.  Except  for  zero-voltage  supercurrents, 
which  are  likely  to  be  very  small,  it  is  probably  impossible  to  separate  the  existence 
of  a  proton  superfluid  from  its  complex  functional  roles,  and  it  is  therefore  pertinent 
to  consider  how'  a  superflow  could  interface  with  known  mechanisms  of  biochemistry 
to  contribute  in  a  distinctive  manner  to  biological  structure  and  function.  We  consider 
phenomena  at  three  levels  of  organization. 

I  Coordination  of  intracellular  events .  According  to  Mitchell’s  chemiosmotic  hy¬ 
pothesis  proton  flow  across  the  membrane  is  a  key  contributor  to  energy  transduction 
processes  such  as  photosynthesis  and  oxidative  phosphorylation  (43 J .  Proton  super¬ 
flow  could  serve  to  provide  a  dynamic  skeleton  that  coordinates  proton  movements 
and  associated  electron  movements  in  energy  transfer  processes  even  if  most  of  the 
protons  involved  were  unpaired  most  of  the  time.  Such  a  skeleton  would  also  serve  to 
order  the  motion  of  other  ions  and  could  contribute  to  persistent  and  sometimes  coor¬ 
dinated  motions  exhibited  by  both  large  and  small  molecules.  A  partial  list  of  such 
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phenomena  includes  persistent  conformational  motions  of  membrane  proteins,  trans¬ 
lational  motions  of  proteins  associated  with  membrane  fluidity,  motions  of  nucleic 
acids  and  protein  enzymes  between  sites  of  action  in  protein  biosynthesis,  relative 
motions  of  enzyme  and  substrate  in  the  recognition  process,  cytoplasmic  streaming, 
and  axoplasmic  transport.  If  such  motions  are  controlled  in  part  by  the  dynamic  order 
of  an  underlying  proton  superflow  they  could  be  significantly  modified  by  the  pro¬ 
duction  of  proteins  that  tailor  the  pairing  interaction  differently,  allowing  for  very  dif¬ 
ferent  dynamic  organizations  in  different  types  of  cells  or  in  different  phases  of  the 
cell  cycle. 

2.  Cellular  pattern  recognition .  Dynamic  order  may  be  required  to  fully  explain 
the  ability  of  cells  to  respond  appropriately  to  conditions  and  objects  in  their  environ¬ 
ment.  This  capability  reaches  highly  specialized  forms  in  CNS  neurons,  which  must 
respond  to  patterns  of  transmitter  and  mediator  input,  and  in  immune  syrtem  cells, 
which  must  produce  antibody  or  take  other  actions  in  response  to  the  pattern  of  anti¬ 
gen  and  hormones  impinging  on  their  external  membrane.  The  surfaces  of  all  cells 
are  dotted  with  receptors  that  respond  to  specific  molecular  shapes,  largely  through  a 
“lock-key"  type  mechanism  involving  r“6  van  der  Waal’s  interactions.  They  may  also 
respond  to  local  physical  perturbations,  such  as  stretching.  Some  of  these  receptors 
produce  internal  messenger  molecules,  in  particular  cyclic  nucleotides  and  calcium, 
which  in  turn  can  trigger  events  that  modify  proteins  on  the  membrane  and  in  the  cy- 
toskeleton  [44],  The  recognition  of  patterns  distributed  over  significant  regions  of 
space  or  time  probably  involves  elaborate  transmission  and  processing  of  signals 
from  the  different  receptors.  The  problem  of  pooling  all  this  information  would  be 
greatly  reduced  and  the  recognition  power  greatly  increased  by  a  superfluid  network 
of  mobile  H  ’  bonds.  The  rigid  aspect  of  such  a  network  would  allow  it  to  respond  to 
patterns  of  membrane  perturbation  and  receptor  activity  over  a  much  larger  region  of 
space  than  could  any  single  macromolecule,  in  effect  extending  the  applicability  of 
the  lock-key  metaphor  to  macroscopic  dimensions. 

The  phase  coherence  of  proton  superflow  could  serve  as  an  initial  screening  mecha¬ 
nism  for  distinguishing  self  from  not  self  if  the  not-self  entity  (say  a  virus  or  another 
cell)  is  covered  by  a  layer  of  bound  water.  Proton  flows  accompanying  the  disappear¬ 
ance  of  the  phase  difference  between  the  two  superflows  subsequent  to  contact  could 
serve  to  alert  the  specific  molecular  and  cellular  mechanisms  already  known  to  play  a 
role  in  rejecting  undesirable  foreign  objects. 

3.  Multicellular  pattern  formation  .  All  cells  in  a  multicellular  organism  must  have 
positional  information  in  order  to  differentiate  and  move  in  a  manner  that  leads  to  the 
development  and  maintenance  of  proper  three-dimensional  form.  Cellular  pattern 
recognition  contributes  to  this  insofar  as  it  enables  cells  to  ascertain  their  location  in 
the  organism  from  local  chemical  influences  impinging  on  them.  In  addition,  close 
junctions  between  membranes  of  different  cells  (in  particular  desmosomes)  could 
serve  to  extend  proton  superflows  into  the  intercellular  networks,  thereby  lending  st.i 
bility,  integrity,  and  unity  to  the  organism. 

It  is  not  difficult  to  extend  the  list  of  interface  mechanisms  mentioned  a  bin  c  (  hr 
discussion  is  intended  to  illustrate  how  proton  superflow  can  be  used  to  lon-tr  i, 
“order-from-order”  models  of  biological  phenomena.  The  principle  ol  ordet  . 
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order  was  first  proposed  by  Schrodinger  [1]  in  his  discussion  of  the  analogy  between 
life  processes  and  low  temperature  phenomena.  It  is  possible  to  construct  models  that 
exhibit  all  of  the  phenomena  mentioned  in  some  degree  without  invoking  an  underly¬ 
ing  superflow.  The  dynamic  order  in  such  models  is  based  on  the  dissipation  of  en¬ 
ergy  in  the  presence  of  suitable  nonlinearities.  It  is  undoubtedly  the  case  that  such 
order  through  dissipation  is  of  profound  importance  in  biology  [see  Ref.  45].  In  the 
proton  superflow  model  energy  dissipation  is  required  to  create  the  macromolecules 
that  comprise  the  membrane.  These  molecules  self-assemble  into  a  structurally  or¬ 
dered  membrane  in  the  presence  of  water  through  a  free  energy  minimization  that  is 
dominated  by  potential  energy,  in  the  fashion  of  crystalline  order.  The  importance  of 
such  potential  energy-based  order  in  biological  materials  is  due  to  the  fact  that  macro¬ 
molecules  are  big  enough  to  specifically  stick  to  one  another  and  small  enough  to  in¬ 
teract  by  means  of  diffusion.  If  the  pairing  mechanism  proposed  in  this  paper  is 
operative,  the  dynamic  order  of  proton  superflow  is  inherent  in  the  structural  order  of 
the  water-membrane  interface.  No  extra  energy  cost  is  required.  As  would  be  ex¬ 
pected,  models  in  which  some  of  the  observable  order  arises  from  the  order  in  an  un¬ 
derlying  superflow  would  thus  have  enhanced  capabilities  as  compared  to  models  in 
which  all  of  the  dynamic  order  requires  an  energy  expenditure  over  and  above  that 
required  to  produce  the  structural  order. 
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Abstract 

Studies  on  the  proton  transfers  in  water  clusters  (H20)„  t»  =  2.  5.  4,  5.  6),  and  in  DNA  base  pairs, 
adenine-thymine  base  pair,  and  guanine-cytosine  base  pair,  have  been  done  by  using  the  potential  functions 
of  polarization  model  for  water  and  ab  initio  scf  method  with  sto-3G  basis  set  for  base  pairs,  respectively. 


Introduction 

There  are  many  studies  on  proton  transfers  in  a  wide  array  of  chemical  and  biolog¬ 
ical  processes  (1|  due  to  the  prevalence  of  hydrogen  bonding  [2]  in  chemical  and  bio¬ 
logical  systems.  Some  studies  on  proton  transfer  between  water  molecules  have  been 
done  by  ab  initio  [3-13]  or  semiempirical  methods  [14  17].  Studies  on  multiple 
hydrogen  bonding  systems  as  well  as  single  hydrogen  bonding  systems  are  also 
important  to  explain  a  number  of  biologically  important  phenomena,  such  as  the  base 
pair  interactions  in  DNA  which  are  essential  to  understand  the  character  of  the 
genetic  code  (18.  19].  Lowdin  [20-22]  has  suggested  that  hydrogen-bonded  protons 
can  be  transferred  from  one  base  of  DNA  to  its  complement  and  that  proton  tunneling 
may  play  a  key  role  in  mutagenesis.  Namely,  in  the  mechanism  of  cell  duplication, 
the  hydrogen  bonds  in  the  double  helix  are  released,  two  DNA  strands  become  at 
least  partly  free,  and  each  one  produces  by  means  of  enzymes  and  available  material 
its  own  complement  so  that  two  complete  DNA  molecules  with  the  same  base 
sequence  as  the  original  one  are  produced.  The  complementary  nature  of  the  DNA 
molecule  explains  the  stability  of  genetic  message  and  how  it  is  propagated  at  a  cell 
duplication.  According  to  Lowdin ’s  work,  there  is  a  possible  error  mechanism  in  the 
genetic  code  replication  as  a  consequence  of  tautomer  formation.  The  proton-electron 
pair  codes  of  the  normal  and  rare  tautomeric  nucleotide  bases  are  shown  in  Figure  I . 
Therefore,  potential  energy  profiles  for  the  proton  transfers  in  DNA  base  pairs  are 
important  to  understand  the  possibility  that  mutational  lesions  in  DNA  can  be  caused. 

In  this  study,  we  have  studied  proton  transfers  in  water  clusters  with  optimum 
geometries,  (H20)„  (n  =  2,  3,  4,  5,  6),  using  the  potential  functions  of  polarization 
model  for  water  [23-24]  because  studies  which  have  been  done  until  now  have  been 
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Figure  1 .  The  proton-electron  pair  code  of  (a)  normal,  (b)  rare  tautomeric,  (c)  normal  and 
rare  tautomeric  nucleotide  bases. 

concerned  principally  with  water  dimer  and  linear  chain  of  water  polymer.  We  have 
also  calculated  the  potential  energies  by  ab  initio  scf  method  with  sto-3G  basis  set 
and  obtained  the  potential  energy  profiles  for  proton  transfers  in  DNA  base  pairs, 
adenine-thymine  base  pair,  and  guanine-cytosine  base  pair.  In  addition,  the  possibil¬ 
ity  of  Lowdin’s  error  mechanism  in  genetic  code  replications  has  been  discussed. 


Methods  of  Calculations 


Water  Clusters 

The  polarization  model  is  used  to  calculate  potential  energies  for  proton  transfer  re¬ 
actions  in  water  clusters,  (H20)„  (n  ~  2, 3, 4, 5, 6).  The  interaction  potential  for  the 
polarization  model  consists  of  two  parts 

<J>  =  d>,  +  <J>n  (1) 

In  Eq.  (1),  the  first  term  is  composed  of  a  sum  of  potentials  for  each  pair  of  particles 
in  the  system 

nH  nH  no  no 

=  2  2  t<t>oH(nj)  4"  2  <t>oo(ry)  (2) 

/</=  1  1=1  7=1  i<;“  I 

The  second  term  is  a  nonadditive  potential,  whose  form  is  suggested  by  classical 
electrostatics  for  polarizable  particles.  Upon  using  A  as  the  distance  unit  and  kcal/ 
mol  as  the  energy  unit,  three  distinct  atom  pair  functions  which  are  comprised  in  <t>, 
have  the  following  forms 
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d>ww(r)  ~  332.1669/r 

4>0„(r)  =  (332.1669/r) [10  exp(-3.69939282r)  -  2] 

+  [—  184.6966743(r  -  re)  +  123.9762188(r  -  rf] 
x  exp[-16(r  -  r,)2] 

<t>oc(r)  =  1328.6676/r  +  8.255  exp[-18.665(r  -  2.45)] 

+  84.293/(1  +  exp[2.778(r  -  2.56)]) 

-  12.299/(1  +  exp[4.817(r  -  3.10)])  (3) 

Here,  rt  stands  for  the  equilibrium  bond  length  in  water  (0.9584  A).  The  polarization 
interaction  requires  self-consistent  calculation  of  induced  dipole  moments  on  each  of 
the  oxygen  particles.  The  moment  on  particle  1  is  determined  by  (r,y  =  r ,  -  r,) 

(r^/r,/)[l  -  K0(rv)] 

-«  2  {Tun  ■  Vm/rJ)[\  ~  K0(rlm)\  (4) 

Here,  a  =  1.444  A3  is  the  oxygen  polarizability.  The  first  sum  covers  all  charges  qt 
(excluding  that  on  1)  while  the  second  sum  includes  all  other  dipoles  nm.  Also, 
dyadic  tensor  T*,  is  defined  by 

Tta  =  I  -  3r,mrlm/ri  (5) 

The  dimensionless  factors  1  -  Kq  account  for  spatial  extension  of  the  polarizable 
electron  cloud  about  each  oxygen.  Specifically,  we  have 

1  -  K0(r)  =  r3/[r3  +  F(r)] 

F(r)  =  1.855785223(r  -  r,)2  exp[-8(r  -  r,)2] 

+  16.95145727  exp(-2.702563425r)  (6) 

Once  the  induced  moments  have  been  obtained,  the  polarization  energy  can  be  com¬ 
puted  as  a  sum  of  modified  charge-dipole  interactions 


<t>n  =  (1/2)  2  [(fii  ■  r„Vr3)[l  -  L0(r„)]  (7) 

i- 1 
ij+n 

The  dimensionless  factors  1  -  L0  also  could  account  for  electron  extension  with 
1  -  Lo(r)  =  1  -  exp(-3. 169888 166r) [  1  +  3.169888l66r  +  5.024095492r2 

-  17.99599078r3  +  23.92285r4] 

The  details  about  the  polarization  model  are  given  in  the  literature  [23, 24). 

Potential  energy  profiles  for  proton  transfer  reaction  in  water  clusters,  (H20)„ 
( n  ~  2,  3,  4,  5,  6),  are  obtained  from  the  following  procedures.  The  optimum 
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geometries  of  water  clusters  are  determined  using  the  potential  functions  of  polariza¬ 
tion  model  and  the  minimization  method  [25] .  Dimer  is  linear  and  trimer  is  cyclic- 
like.  Tetramer.  pentamer.  and  hexamer  are  cyclic.  These  structures  of  water  clusters 
are  consistent  with  other  results  [26].  Among  several  ROHs  which  are  the  bond 
lengths  between  hydrogen-donor  oxygen  atom  and  transferred  hydrogen  atom,  one 
ROH  is  determined  as  a  reaction  coordinate  (of  course,  in  dimer,  there  is  only  one 
ROH).  ROH  is  increased  or  decreased  by  a  certain  magnitude  (usually  0.1  A)  and 
other  atoms,  except  one  oxygen  atom  and  one  hydrogen  atom  which  are  fixed  as  a 
reaction  coordinate,  are  optimized  to  obtain  the  minimum  energy  geometries.  In  other 
studies,  the  positions  of  protons  in  simultaneous  proton  transfer  are  determined  artifi¬ 
cially,  while  in  our  study,  the  positions  of  protons  move  freely  except  one  hydrogen 
atom  which  is  fixed  as  a  reaction  coordinate. 

DNA  Base  Pairs 

Ab  initio  SCF  calculations  with  STO-3G  basis  set  have  been  performed  to  obtain  the 
potential  energy  profiles  for  the  proton  transfers  in  DNA  base  pairs,  adenine-thymine 
base  pair,  and  guanine-cytosine  base  pair,  respectively.  All  calculations  in  this  study 
have  been  done  on  an  IBM  3083  computer  using  a  version  of  the  Gaussian  82  system 
of  programs  The  structures  of  DNA  base  pairs,  adenine-thymine  base  pair,  and 
guanine-cytosine  base  pair,  are  shown  in  Figure  2.  Bond  lengths,  angles,  and  inter- 
molecular  distances  between  DNA  bases  are  taken  from  Arnott  et  a).  [27],  In 
adenine-thymine  base  pair,  the  bond  length  between  N„  and  H ,,  and  that  between  Nl6 
and  H,2  were  elongated  in  steps  of  0.1  A  along  the  lines  N„-023  and  N16-N,,  respec¬ 
tively.  and  energies  for  double  proton  transfer  were  calculated  [see  Fig.  2(a)].  In 
guanine-cytosine  base  pair,  two  cases  of  the  transferred  protons  are  possible.  One  is 
the  pair  of  H,0  and  H24  and  the  other  is  that  of  H,2  and  H24  [see  Fig.  2(b)].  In  'he  first 
case,  the  bond  length  between  N,  and  Hu)  and  that  between  N23  and  H:J  were  elong¬ 
ated  in  steps  of  0.1  A  along  the  lines.  N,-Nr  and  N2,-Ol4.  respectively,  and  energies 
for  double  proton  transfer  were  calculated  Similarly,  in  the  second  case  the  bond 
length  between  N,,  and  Hl2  and  that  between  N2,  and  H24  were  elongated  in  steps  of 
0.1  A  along  the  lines  Nl(-02q  and  N,,-Oh.  respectively,  and  energies  for  double  pro 
ton  transfer  were  calculated. 
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Results  and  Discussions 

Water  Clusters 

We  obtained  potential  energies  and  their  profiles  for  proton  transfer  reactions  in 
water  clusters,  (H20)„  (n  =  2,  3,  4,  5,  6).  These  results  are  summarized  in  Tables  I 
to  V  and  Figures  3  to  7.  Proton  transfer  reaction  in  dimer  shows  single-well  potential 
energy  profile,  while  proton  transfer  reactions  in  trimer,  tetramer,  pentamer,  and  hex- 
amer  show  double-well  potential  energy  profiles.  We  also  obtained  three-dimensional 
pictures  for  proton  transfer  reactions  in  water  clusters.  Figure  8  shows  the  three- 
dimensional  pictures  for  proton  transfer  reaction  in  dimer  as  an  example  of  single 


Table  I.  Potential  energies  for  proion 
transfer  reaction  in  dimer. 


RC' 

Potential  energy  (keal/mol) 

1 

-2066.17 

2 

-2072.66 

3 

-2066.11 

4 

-2053.73 

5 

-2040.94 

6 

-2029.90 

7 

-2019.80 

8 

-2008.99 

“When  RC  =  1.  RO,H,  =  0.894  A.  As 
RC  is  increased  by  I.  RO,Hi  is  increased  by 
0.1  A.  RO,H|  is  the  distance  between  oxy¬ 
gen  ( 1 )  and  hydrogen  ( I )  in  dimer. 


Table  II.  Potential  energies  for  proton 
transfer  reaction  in  trimer. 


RC 

Potential  energy  (keal/mol) 

1 

-3107  58 

2 

-3114.11 

3 

-3108.38 

4 

-3097.60 

5 

-3102.81 

6 

-3108.91 

7 

-3111.66 

8 

-3110.91 

‘When  RC  -  I.  RO,H,  =  0.900  A.  As 
RC  is  increased  by  I.  RO,H,  is  increased  by 
0.1  A.  RO,H,  is  the  distance  between  oxy¬ 
gen  ( I )  and  hydrogen  ( 1 1  in  trimer. 


Table  III  Potential  energies  for  proton 
transfer  reaction  in  tetramer. 


RC' 

Potential  energy  (keal/mol) 

1 

-4157.56 

2 

-4163.46 

3 

-4162,29 

4 

-4159.41 

5 

-4162.50 

6 

-4162.71 

7 

-  4161  14 

- - 

-  - - - 

'When  RC  =  I.  RO,H,  =  0.927  A.  As 
RC  is  increased  by  I ,  RO|H,  is  increased  by 
0  1  A.  RO,H|  is  the  distance  between  oxy¬ 
gen  ( I )  and  hydrogen  ( I )  in  tetramer 


Table 

IV  Potential  energies  for 
transfer  reaction  in  pentamer 

proton 

RC 

Potential  energy  (kcal/tnoll 

1 

-5218.19 

2 

5221.57 

3 

-5220.23 

4 

-5221.95 

5 

5222.42 

6 

-5220.13 

7 

5215.94 

"When  RC  =  1.  RO,H,  =  1.007 

A  As 

RC  is  increased  by  I .  RO|H,  is  increased  by 
0  I  A  RO,H,  is  the  distance  between  oxy 
gen  ( I )  and  hydrogen  (Din  pentamer 
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Table  V.  Potential  energies  for  proton 
transfer  reaction  in  hexamer. 


RC1 

Potential  energy  (kcal/mol) 

1 

-6270.92 

2 

-6273.30 

2.5 

-6273.20 

3 

-6274.41 

4 

-6275.85 

5 

-6275.59 

6 

-6272.36 

•When  RC  =  1,  RO,H,  =  1.029  A.  As 
RC  is  increased  by  1,  RO|H,  is  increased  by 
0. 1  A.  RO|H,  is  the  distance  between  oxy¬ 
gen  (1)  and  hydrogen  (1)  in  hexamer. 


r 


"13 

Figure  2(b).  Structures  of  DNA  base  pairs.  Guanine-cytosine  base  pairs. 


proton  transfer,  while  Figure  9  shows  the  three-dimensional  pictures  for  proton  trans¬ 
fer  reaction  in  tetramer  as  an  example  of  simultaneous  proton  transfer.  In  Figure  8. 
hydrogen  (1)  is  transferred  from  oxygen  (1)  to  oxygen  (2).  As  reaction  coordinate 
(RC)  is  increased  from  4  to  5,  hydrogen  (1),  which  is  close  to  oxygen  (1)  when 
RC=4,  becomes  close  to  oxygen  (2).  In  Figure  9,  hydrogen  (3),  hydrogen  (5),  and 
hydrogen  (7)  are  transferred  simultaneously  with  hydrogen  (1).  Similarly,  as  RC  is 
increased  from  4  to  5,  hydrogen  (1),  hvdrogen  (3),  hydrogen  (5),  and  hydrogen  (7), 
which  are  close  to  oxygen  (1),  oxygen  (2),  oxygen  (3),  and  oxygen  (4),  respectively, 
when  RC=4,  become  close  to  oxygen  (4),  oxygen  (1),  oxygen  (2),  and  oxygen  (3), 
respectively.  As  a  result,  proton  transfer  reaction  in  dimer  shows  single  proton  trans¬ 
fer  and  single-well  potential  energy  profile,  while  proton  transfer  reactions  in  trimer. 
tetramer,  pentamer,  and  hexamer  show  simultaneous  proton  transfers  and  double-well 
potential  energy  profiles.  The  facts  that  single  proton  transfer  results  in  a  single- well 
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Figure  3.  Potential  energy  profile  for  proton  transfer  reaction  in  dimer. 


Figure  4.  Potential  energy  profile  for  proton  transfer  reaction  in  trimer. 
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Figure  3.  Potential  energy  profile  lor  proton  transfer  reaction  in  tetramer. 
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HC 

Figure  0.  Potential  energy  profile  for  proton  transfer  reaction  in  pentamer. 
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Figure  7.  Three-dimensional  picture  for  proton  transfer  reaction  in  hcxamer. 
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Figure  8.  Three-dimensional  picture  for  proton  transfer  reaction  in  tetramer. 
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potential  and  simultaneous  proton  transfers  result  in  double-well  potentials  are  consis¬ 
tent  with  other  studies  [28,  29].  Namely,  in  dimer,  the  potential  energy  increases 
monotonically  and  results  in  a  single-well  potential,  because  single  proton  transfer 
leads  to  a  pair  of  charged  species.  But  in  trimer,  tetramer,  pentamer,  and  hexamer, 
simultaneous  proton  transfers  generally  preserve  electroneutrality  in  the  system  and 
the  potential  energies  result  in  double- well  potentials.  In  cyclic  geometries  especially, 
simultaneous  proton  transfers  occur  easily,  since  the  barriers  of  simultaneous  proton 
transfer  reactions  are  relatively  small. 

DNA  Base  Pairs 

The  calculated  energies  for  the  double  proton  transfers  in  DNA  base  pairs  are  listed 
in  Tables  VI  to  VIII.  Figures  10  to  12  also  show  the  energy  profiles  for  the  double 
proton  transfers  in  DNA  base  pairs. 

Adenine-thymine  base  pair;  The  double  proton  transfer  in  adenine-thymine  base 
pair  gives  double-well  potential  energy  profile.  From  Table  VI  and  Figure  10,  one 
sees  two  minima  (when  RC=2  and  RC=  10).  The  barrier  height  for  the  double  proton 
transfer  is  70.24  kcal/mol.  Guanine-cytosine  base  pair  (Hl0  and  H24);  The  double 
proton  transfer  in  guanine-cytosine  base  pair  (the  pair  of  H  ,0  and  H24)  gives  a  double¬ 
well  potential  energy  profile.  From  Table  VII  and  Figure  11,  one  sees  two  minima 
(when  RC=2  and  RC=  10).  The  barrier  height  for  the  double  proton  transfer  is  48.85 
kcal/mol. 

Guanine-cytosine  base  pair  (H,2  and  H24);  The  double  proton  transfer  in  guanine- 
cytosine  base  pair  (the  pair  of  H,2  and  H24)  gives  double-well  potential  energy  profile. 
From  Table  VIII  and  Figure  12,  one  also  sees  two  minima  (when  RC=2  and  RC=9). 
The  barrier  height  for  the  double  proton  transfer  is  64.02  kcal/mol. 


Table  VI.  Energies  for  double  proton  transfer  in  adenine-thymine  DNA  base  pair  (Hl3 

and  H^). 


RC 

4(N,-H13) 

(A) 

4(N16-H22) 

(A) 

E 

(a.u.) 

A E‘ 

(kcal/mol) 

1 

0.91 

0.92 

-904.26040 

20.45 

2 

1.01 

1.02 

-904.29299 

0.00 

3 

1.11 

1.12 

-904.28411 

5.57 

4 

1.21 

1.22 

-904.25518 

23.73 

5 

1.31 

1.32 

-904.22124 

45.03 

6 

1.41 

1.42 

-904.19383 

62.22 

7 

1.51 

1.52 

-904.18105 

70.24 

8 

1.61 

1.62 

-904.18493 

67.81 

9 

1.7* 

1.72 

-904.19933 

58.77 

10 

1.81 

1.82 

-904.21099 

51.46 

11 

1.91 

1.92 

-904.20056 

58.00 

'Energy  difference  when  compared  with  the  energy  (RC  =  2). 
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Table  VII.  Energies  for  double  proton  transfer  in  guanine-cytosine  DNA  base  pair 

(H|0  and  H„). 


RC 

d(  N,-H,„) 

(A) 

(A) 

£ 

(a.u.) 

A£‘ 

(kcal/mol) 

1 

0.92 

0.91 

-919.99521 

21.60 

2 

1.02 

1.01 

-920.02963 

0.00 

3 

1.12 

1.11 

-920.02368 

3.73 

4 

1.22 

1.21 

-919.99939 

18.98 

5 

1.32 

1.31 

-919.97244 

35.89 

6 

1.42 

1.41 

-919.95443 

47.19 

7 

1.52 

1.51 

-919.95178 

48.85 

8 

1.62 

1.61 

-919.96311 

41.74 

9 

1.72 

1.71 

-919.97897 

31.79 

10 

1.82 

1.81 

-919.98336 

29.04 

11 

1.92 

1.91 

-919.95326 

47.92 

‘Energy  difference  when  compared  with  the  energy  (RC  =  2). 

Table  VIII. 

Energies  for  double  proton  transfer  in  guanine-cytosine  DNA  base  pair 
(Hi2  and  Hm). 

RC 

4N(i-H(2) 

tflNa-H*) 

E 

A  £‘ 

(A) 

(A) 

(a.u.) 

(kcal/mol) 

1 

0.91 

0.91 

-919.99412 

22.28 

2 

1.01 

1.01 

-920.02963 

0.00 

3 

1.11 

1.11 

-920.02338 

3.92 

4 

1.21 

1.21 

-919.99727 

20.31 

5 

1.31 

1.31 

-919.96628 

39.75 

6 

1.41 

1.41 

-919.94111 

55.55 

7 

1.51 

1.51 

-919.92813 

63.69 

8 

1.61 

1.61 

-919.92761 

64.02 

9 

1.71 

1.71 

-919.93193 

61.31 

10 

1.81 

1.81 

-919.92514 

65.57 

11 

1.91 

1.91 

-919.88186 

92.73 

‘Energy  difference  when  compared  with  the  energy  (RC  =  2). 
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Figure  10.  Energy  profile  for  double  proton  transfer  in  guanine-cytosine  DNA  base  pair 

(H„  and  HM). 
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Figure  11.  Energy  profile  for  double  proton  transfer  in  guanine-cytosine  DNA  base  pair 

(H,j  and  HM). 


From  the  above  results,  one  can  conclude  that  the  double  proton  transfers  in  DNA 
base  pairs  give  double-well  energy  profiles  and  Lowdin’s  proton  tunneling  model  for 
mutagenesis  [221  can  be  adopted.  Namely,  an  error  mechanism  in  the  genetic  code 
replication  as  a  consequence  of  tautomer  formation  can  be  considered  as  a  possible 
error  mechanism,  because  the  double  proton  transfers  in  DNA  base  pairs  give  double¬ 
well  energy  profiles.  But  an  error  mechanism  in  the  genetic  code  replication  as  a 
consequence  of  ion-pair  formation  which  is  incurred  by  single  proton  transfer  may  be 
considered  an  unreasonable  error  mechanism,  as  Clementi  et  al.  [28]  pointed  out  that 
a  single  proton  transfer  in  DNA  base  pair,  which  formed  an  ion-pair,  gave  a  single¬ 
well  energy  profile  characterized  by  a  monotonically  increasing  energy  function. 

In  conclusion,  if  we  adopt  Lowdin’s  error  mechanism  in  the  genetic  code  replica¬ 
tion  as  a  consequence  of  tautomer  formation,  our  calculated  results  provide  informa¬ 
tion  that  the  possibility  of  error  in  the  genetic  code  ’•‘'^cation  in  guanine-cytosine 
base  pair  occurs  more  easily  than  that  in  adenine-thymine  base  pair  because  the 
barrier  height  of  the  double  proton  transfers  in  guanine-cytosine  base  pair  (48.85 
keal/mol  and  64.02  keal/mol)  is  lower  than  that  in  adenine-thymine  base  pair  (80.24 
keal/mol).  Also,  in  the  guanine-cytosine  base  pair,  the  possibility  of  error  in  the 
genetic  code  replication  in  the  pair  of  H,0  and  HM  occurs  more  easily  than  that  in  the 
pair  of  H,2  and  HM  because  the  barrier  height  of  the  double  proton  transfer  in  the  pair 
of  H|0  and  (48.85  keal/mol)  is  lower  than  that  in  the  pair  of  Hl2  and  H24  (64.02 
keal/mol).  However,  we  may  conclude  that,  in  order  to  obtain  the  more  reliable 
quantitative  results  for  the  proton  transfers  in  DNA  base  pairs,  we  have  to  make  fur- 
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Figure  12.  Energy  profile  for  double  proton  transfer  in  guanine-cytosine  DNA  base  pair 

(Ho  and  H24)- 


ther  studies  to  obtain  potential  energies  with  a  larger  basis  set  and  optimize  the 
geometries  of  DNA  base  pairs  which  may  require  a  lot  of  computing  time.  In  the  near 
future,  we  will  carry  out  these  extended  studies  and  predict  tunneling  rates  using  the 
improved  results  (e.g.,  more  reliable  potential  barrier  heights  and  band  widths). 
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Abstract 

Practical  aspects  of  the  calculation  of  the  proton  transfer  process  in  a  model  of  the  active  site  of  the  thiol 
protease  papain  are  explored  with  basis  sets  of  different  sizes.  Results  from  ab  initio  calculations  with  the 
STO-3G,  4-31G,  6-31G,  6-31G*  basis  set,  and  a  6-31G  basis  set  augmented  with  polarization  functions  on 
the  sulfur  atom  are  compared  for  their  performance  in  describing  the  proton  transfer  energy.  The  nature  of 
the  convergence  of  the  calculated  properties  of  the  potential  curve  for  proton  transfer  with  the  increase  in 
basis  set  indicates  the  need  for  a  split-valence  basis  set  and  for  polarization  functions  on  the  sulfur  in  order 
to  achieve  an  appropriate  description  of  this  system.  Correlation  corrections  to  the  calculated  energies  are 
shown  to  contribute  significantly  to  the  characteristics  of  the  proton  transfer  energy  curve. 


Introduction 

Proton  transfer  is  a  common  element  in  many  mechanisms  proposed  for  enzymatic 
reactions  11,2].  The  role  of  the  proton  transfer  in  these  schemes  is  to  generate  an 
acid-base  pair  which  can  be  chemically  reactive.  For  example,  in  serine-proteases  the 
nucleophile  Ser~  is  formed  by  a  transfer  of  the  Ser  proton  to  histidine  forming  a 
Ser  . . .  HisH+  pair  in  the  active  site  [3];  in  cysteine  proteases  the  proton  from  a  Cys 
is  transferred  to  a  His  to  form  a  strong  nucleophile  S'  [for  a  review  see  Refs.  1-3]. 
Because  the  proton  transfer  has  a  central  role  in  such  enzymatic  mechanisms,  much 
effort  is  spent  investigating  the  details  of  the  process.  Many  theoretical  studies  have 
focused  on  the  proton  transfer  process  in  a  variety  of  small  model  systems 
[4-9,  19,20].  Important  details  of  this  process  have  been  revealed  by  such  studies, 
and  a  common  finding  has  been  that  the  calculated  potential  energy  curve  is  strongly 
basis  set  dependent.  This  paper  evaluates  some  practical  considerations  in  the  calcu¬ 
lations  of  the  proton  transfer  mechanism  proposed  for  the  enzymatic  reaction  of 
papain  by  Drenth  et  al.  [10, 11],  The  first  step  in  this  mechanism  is  a  transfer  of  the 
proton  from  the  sulfur  of  Cys-25  to  the  Ndl  of  the  imidazole  of  His- 159  to  form  a 
charge  separation  in  the  active  site,  referred  to  as  the  “zwitterion”  state. 

Based  on  the  conclusions  reached  from  theoretical  studies  on  other  model  systems 
(see  above),  a  first  concern  in  the  investigation  of  this  mechanism  is  the  identification 


*  On  leave  from  the  Laboratory  of  Chemical  Physics,  University  of  Groningen.  Nijenborgh  16.  9747 
AG  Groningen,  The  Netherlands. 
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of  a  suitable  basis  set  for  the  calculation  of  the  potential  energy  curve  which  de¬ 
scribes  the  proton  transfer.  The  characteristics  of  the  proton  transfer  energy  curves 
obtained  from  calculations  with  a  variety  of  basis  sets  are  compared  here  in  order  to 
identify  the  minimal  computational  level  at  which  the  results  agree  with  those  from 
the  most  extensive  calculations  that  were  practicable  for  the  system. 

Computational  Details 

The  coordinates  of  the  model  active  site  were  taken  from  the  x-ray  structure  of 
papain  refined  to  1.65  A  resolution  [12].  The  residues  of  the  active  site  that  partici¬ 
pate  in  the  proton  transfer  process  were  modeled  by  methanethiol  for  Cys-25,  and  by 
imidazole  for  His-159  (Fig.  1).  Throughout  the  proton  transfer  process  the  S  —  Ndl 
distance  was  kept  fixed  at  the  value  of  3.38  A  observed  in  the  crystal. 

The  ab  initio  molecular  orbital  calculations  were  performed  with  the  GAUS- 
SIAN82  [13]  and  the  HONDO  [14]  program  packages.  The  basis  sets  used  in  this 
comparison  were  sto-3G  [15],  4-31G  [16],  6-31G  [17],  6-31G*  [18],  and  a  6-31G  + 
which  is  a  6-3 1G  basis  set  augmented  with  a  set  of  3d-functions  (six  primitives)  on 
the  sulfur  atom,  with  an  exponent  of  0.65.  Correlation  effects  were  calculated  with 
the  Moller-Plesset  perturbation  expansion  to  second  (mp2)  and  third  (mp3)  order. 
Frozen  core  MP2  and  mp3  calculations  were  performed  only  with  the  6-3 1G  and  the 
6-3 1G+  basis  set. 


Figure  1.  Structure  of  the  model  for  the  residues  Cys-25  and  His-159  in  the  active  site  of 
papain.  The  coordinates  were  taken  from  the  x-ray  structure  ( 12)  and  the  moving  proton 
(shown  near  the  sulfur  atom)  was  placed  on  the  S-Ndl  axis. 
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A  total  of  16  points  was  used  to  construct  the  potential  energy  curves.  The  proton 
was  displaced  along  the  line  between  the  sulfur  and  the  nitrogen  (Fig.  1).  The  energy 
was  calculated  at  equidistant  displacements  of  the  proton  from  the  sulfur,  separated 
by  0.1  A.  The  extrema  of  the  curve.  Mini,  Minll,  and  were  calculated  from  a  fit 
to  a  polynomial  of  degree  15.  Mini  is  the  minimum  with  the  proton  near  the  sulfur, 
Minll  is  the  minimum  with  the  proton  near  the  Ndl,  and  is  the  maximum  in  the 
potential  energy  curve.  The  parameters  chosen  to  characterize  the  potential  energy 
curve  are  £sub,  which  is  defined  as  (Minll-Minl),  and  E*'  defined  as  (f^-Minl). 

The  correlation  corrections  to  the  energy  were  calculated  for  the  three  points  on  the 
curve  representing  the  two  minima  and  the  transition  state. 

Results  and  Discussion 

Figure  2  shows  the  proton  transfer  energy  curve  calculated  with  the  sto-3G  basis 
set.  The  curve  has  a  minimum  for  the  proton  near  the  sulfur  atom,  but  no  minimum 
when  it  approaches  the  Ndl  of  the  imidazole.  A  change  in  the  slope  of  the  potential 
energy  curve  occurs  when  the  proton  approaches  the  Ndl  atom.  The  single  minimum, 
as  well  as  the  other  characteristics  of  the  curve  are  similar  to  those  reported  for  the 
same  system  [19,20]  from  a  calculation  with  the  minimal  basis  set  developed  by 
Mehler  and  Paul  [21], 


S-H  distance  (A) 

Figure  2.  Energies  of  the  model  active  site  of  papain  (see  Fig.  1 )  as  a  function  of  the  posi¬ 
tion  of  the  proton.  The  calculations  were  done  with  the  sto-3G  basis  set  and  the  results  are 
shown  relative  to  a  zero  value  defined  as  the  energy  of  the  system  with  the  proton  near  the 
sulfur  atom  at  an  S-H  distance  of  1 . 1  A. 
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The  results  of  calculations  with  the  split  valence  basis  set  4-3 1C  are  represented  in 
Table  1  and  in  Figure  3.  With  this  basis  set,  two  minima  appear  in  the  proton  transfer 
curve.  The  energies  calculated  with  the  sro-3 G  and  the  4-3 1C  basis  sets  are  com¬ 
pared  in  Table  II  and  Figure  4(a).  The  results  show  that  the  calculation  with  the 
larger  basis  set  affects  the  zwitterion  more  than  the  other  two  states  of  the  system. 
Thus,  the  energy  of  the  zwitterion  is  lowered  far  more  than  the  energy  of  the  neutral 
form  or  the  transition  state.  This  indicates  the  importance  of  a  split-valence  basis  set 
for  the  description  of  a  state  in  which  a  charge  separation  has  occurred. 


Table  I.  Basis  set  dependence  of  the  characteristic  parameters  of  the  proton  transfer 
curve  in  the  model  active  site  of  papain.1 


STO-3G 

4-3 1G 

6-3 1G 

6-3IG" 

/{(Mini)* 

1.332 

1.374 

1.370 

1.339 

/{(Max) 

— 

2.009 

1.968 

2.000 

£“ 

— 

21.1 

21.1 

31.8 

/{(Minll) 

— 

2.297 

2.317 

2.312 

£sub 

— 

16.6 

13.3 

25.6 

1  R  is  the  distance  of  the  proton  to  the  sulfur,  in  A.  The  extrema  on  the  curve, 
(Mini,  Max,  Minll),  the  activation  energy  (£“)  and  the  stabilization  energy  (E!“b)  are 
calculated  as  described  in  the  text.  Energies  are  expressed  in  kcal/mol. 
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Figure  3.  Energies  of  the  model  active  site  of  papain  (see  Fig.  I)  as  a  function  of  the  posi¬ 
tion  of  the  proton.  The  calculations  were  done  with  the  4-31G,  6-31G,  and  6-3IG*  basis 
sets,  as  indicated.  The  results  are  shown  relative  to  a  zero  value  defined  as  the  energy  of 
the  system  with  the  proton  at  an  S-H  distance  of  1.1  A,  calculated  with  the  corresponding 

basis  set. 
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Table  II.  Basis  set  dependence  of  the  energies  of  the  active  site  model  of  papain  as  a 
function  of  the  position  of  the  proton.* 


R(SH) 

El 

E2 

E3 

E4 

1.1 

-4253.8 

-4693.0 

-439.2 

-29.7 

1.2 

-4250.0 

-4690.4 

-440.4 

-27.9 

1.3 

-4249.6 

-4690.4 

-440.9 

-25.8 

1.4 

-4252. 1 

-4693.0 

-440.8 

-23.6 

1.5 

-4257.2 

-4697.7 

-440.5 

-21.3 

1.6 

-4264.1 

-4704.2 

-440.1 

-19.2 

1.7 

-4272.3 

-4712.2 

-439.8 

-17.4 

1.8 

-4281.4 

-4721.4 

-439.9 

-16.0 

1.9 

-4291.2 

-4731.5 

-440.3 

-  14.8 

2.0 

-4301.0 

-4742.1 

-441.0 

-13.9 

2.1 

-4309.7 

-4751.7 

-442.0 

-13.2 

2.2 

-4320. 1 

-4763.2 

-443.1 

-12.8 

2.3 

-4328.8 

-4773.0 

-444.2 

-12.4 

2.4 

-4337.7 

-4782.6 

-445.0 

-12.2 

2.5 

-4349.0 

-4795.6 

-446.6 

-12  0 

*  R  is  the  distance  of  the  proton  to  the  sulfur,  in  A.  Values  in  the  columns  are  obtained 
from  differences  in  energies  calculated  with  different  basis  sets,  defined  as  follows: 
El  =  E(4-3 1G )-E( STO-3G ),  E2  =  E(6-3IG)-E0>TO-3G).  E3  =  E(6-3IG)-E(4-31G). 
E4  =  E(6-31G*)-E(6-31G);  all  energy  differences  are  in  kcal/mol. 


Extending  the  basis  set  to  6-3 1C,  which  provides  an  improved  description  of  the 
core  electrons,  results  in  qualitatively  the  same  proton  transfer  energy  curve  as  that 
calculated  with  the  4-3 1C  basis  set.  Quantitatively,  the  energies  from  the  6-31 G  cal¬ 
culation  are  about  440  kcal/mol  lower  than  the  4-3 1G  results  [Table  II  and 
Fig.  4(b)],  Unlike  the  transition  from  a  minimal  basis  set  (sto-3G)  to  the  split- 
valence  basis  set  (4-3 1G),  there  is  little  improvement  in  the  description  of  the  zwit- 
terion  with  respect  to  the  neutral  form  in  going  from  4-31G  to  6-31G.  Consequently, 
the  activation  energy,  EK\  is  the  same  in  the  calculations  with  both  basis  sets,  and 
the  stabilization  energy,  Esub,  is  only  3.3  kcal/mol  smaller  in  the  case  of  the  6-31G 
basis  set.  Augmenting  the  6-3 1G  basis  set  with  3d  functions  produces  a  dramatic 
change  in  the  calculated  values  of  EKI  and  £sl,b,  which  are  increased  by  10.7  and 
12.3  kcal/mol,  respectively.  The  characteristics  of  the  proton  transfer  curve  calcu¬ 
lated  with  these  basis  sets  are  presented  in  Table  I  and  Figure  3.  The  results  pre¬ 
sented  in  Table  II  and  Figure  4(b)  show  that  with  the  augmented  basis  set,  the  neutral 
form  is  more  stabilized  than  the  zwitterion.  Because  this  result  could  be  due  to  the 
unequal  representation  of  the  sulfur  and  the  nitrogen  by  the  basis  set  augmented  only 
on  the  sulfur,  it  is  noteworthy  that  even  with  the  6-3 1G*  basis  set  the  neutral  form  is 
stabilized  more  than  the  zwitterion  relative  to  the  results  with  the  6-3 1G  basis  set 
(Table  III).  The  values  of  E*1  and  £,ub  calculated  with  the  6-3 1G+  and  the  6-3 IG* 
basis  sets  are  very  similar,  differing  only  by  3.2  and  3.5  kcal/mol,  respectively 
(Table  IV).  These  results  indicate  that  the  3d  functions  on  the  sulfur  are  the  main 
contributors  to  the  improvement  produced  by  the  addition  of  polarization  functions. 
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Figure  4.  Differences  of  the  total  energies  of  the  model  active  site  of  papain  (see  Fig.  1) 
calculated  with  different  basis  sets.  The  values  are  shown  relative  to  the  energy  difference 
calculated  for  an  S-H  distance  of  1  I  A  with  the  corresponding  basis  sets,  (a)  The  effect 
of  split-valence  representations:  Difference  between  energies  obtained  with  double  zeta 
basis  sets  and  the  minimal  basis  set:  El  =  E(4-31G)  -  E(sto-3G),  E2  =  E(6-31G 
E(sto-3G);  (b)  The  effect  of  improved  core  in  the  split-valence  representation,  and  the  et  \i 
of  polarization  functions:  E3  =  E(6-31G)  -  E(4-31G),  E4  =  E(6-3!G*)  -  E(6-31G). 
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Table  III.  Additional  energy  stabilization 
with  respect  to  results  obtained  with  the 
6-3 1G  basis  set  provided  by  polarization 
functions  on  sulfur  alone  (6-3 IG+)  and  on 
all  heavy  atoms  (6-31G*).“ 


Basis  set 

R(S— H) 

6-31G* 

6-31G* 

1.4 

-23.6 

-99.0 

2.0 

-13.9 

-86.1 

2.3 

-12.4 

-84.3 

*  R(S  —  H)  is  the  distance  of  the  proton 
from  the  sulfur  in  A.  The  three  points  corre¬ 
spond  to  the  extrema  in  the  proton  transfer 
curve  calculated  with  the  6-31G  basis  set. 
Energies  are  in  kcal/mol. 


Correlation  energy  corrections  at  the  mp2  and  the  mp3  level  calculated  with  the 
6-3 1G  and  the  6-3 1G+  basis  set  for  the  two  minima  and  the  transition  state  are  shown 
in  Table  IV.  The  mp2  correction  calculated  with  the  6-3 1G  basis  set  lowers  £*“  by 
4.1  kcal/mol,  to  a  new  value  of  16.7  kcal/mol.  The  mp3  contribution  has  the  oppo¬ 
site  sign  and  raises  £“  back  to  18.1  kcal/mol.  Notably,  Minll  is  almost  eliminated 
by  the  correlation  contributions.  To  evaluate  the  possibility  that  the  position  of  the 
second  minimum  was  shifted  from  the  previously  observed  minimum  at  2.3  A  by 


Table  IV,  Values  of  the  activation  ener¬ 
gies  (£“*)  and  stabilization  energies  (£*"b) 
calculated  with  various  basis  sets  and  corre¬ 
lation  energy  corrections.' 


£" 

£*■* 

6-31G 

20.8 

13.2 

6-31G* 

30.4 

24.3 

6-31G* 

33.6 

27.8 

6-3 1  G/mp2 

16.7 

18.3 

6-31G/mp3 

18.2 

17.7 

6-31G-»-/mp2 

24.9 

27.5 

*  Results  are  taken  from  calculations  at 
the  three  distances  of  the  moving  proton 
from  the  sulfur,  at  which  the  extrema  of  the 
potential  energy  curve  for  proton  transfer 
occur  in  the  calculation  with  the  6-3 1G 
basis  set:  R  =  1.4A,  2.0A,  and  2.3A. 
Energies  are  in  kcal/mole. 
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the  addition  of  the  correlation  correction,  the  mp2  contribution  was  also  calculated  for 
the  proton  closer  to  S,  at  an  S  —  H  distance  of  2.2  A.  The  choice  of  this  point  was 
based  on  the  observation  that  mp2  tends  to  increase  slightly  the  optimal  bond  length 
(22),  in  this  case  between  the  proton  and  Ndl.  The  calculated  scf  +  mp2  energy  at 
the  S  —  H  distance  of  2.2  A  is  only  0.9  kcal/mol  lower  than  the  energy  of  the  system 
at  a  distance  of  2.3  A. 

The  effect  of  correlation  in  calculations  with  the  6-3 \G*  basis  set  is  nearly  the 
same  as  for  the  6-3 1C  basis  set.  Due  to  the  prohibitively  large  computational  effort, 
only  the  mp2  correction  for  the  activation  barrier  could  be  calculated  with  this  basis 
set.  The  correlation  correction  decreases  the  barrier  by  5.5  kcal/mol,  which  is  not 
very  different  from  the  mp2  correction  for  the  activation  barrier  calculated  with  6-3 1C 
(4.1  kcal/mol). 


Conclusions 

The  results  of  this  study  show  that  a  basis  set  with  at  least  a  split-valence  represen¬ 
tation  is  required  to  describe  the  proton  transfer  in  the  active  site  of  papain.  Even  the 
smallest  split  valence  basis  set  used  here  (4-3 IG)  yielded  a  double  well  potential  for 
the  proton  transfer.  The  characteristics  of  this  potential  curve,  namely,  E"6  and  £*“, 
converged  to  the  values  obtained  with  6-3 1G*  as  the  basis  sets  were  gradually  aug¬ 
mented.  The  results  also  emphasize  the  need  for  a  basis  set  with  polarization  func¬ 
tions  on  the  sulfur  atom  for  the  adequate  representation  of  this  system.  Because  the 
mp2  correlation  corrections  are  similar  in  magnitude  to  the  stabilization  energy  of  the 
zwitterion  with  respect  to  the  transition  state,  it  is  not  possible  to  establish  whether 
the  double  well  nature  of  the  potential  will  be  maintained  when  electron  correlation  is 
completely  accounted  for. 
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Abstract 

Dose-response  relationships  of  drug-receptor  binding  show  that  receptor  sites  are,  in  many  cases,  singly 
occupied  by  the  drug  molecules.  Although  this  single-site  occupancy  may  be  demonstrated  for  bound  hor¬ 
mone  analogues  which  inhibit,  stimulate,  or  partially  stimulate  the  response,  the  molecular  occupancy  of 
the  receptor  site  is  essentially  statistical  in  character,  and  the  observed  binding  constant  may  represent  a 
sum  of  conformer  contributions.  These  conformer  contributions  are  proportionately  weighted  by  the  rele¬ 
vant  conformer  fractions  of  the  drug  and  receptor  for  each  interaction.  In  practice,  more  than  one  con¬ 
former  may  bind  productively  to  the  receptor,  while,  on  the  other  hand,  even  within  one  identifiable 
conformation,  restriction  on  the  fraction  of  molecules  eliciting  a  response  could  produce  partial  agonism. 
The  thermodynamic  representation  of  explicit  models  of  receptor  interaction  are  reviewed  taking  into  ac¬ 
count  the  reference  phase  of  the  receptor  environment  and  its  potential  heterogeneity.  Decomposition  of 
thermodynamic  data  for  membrane-bound  0-adrenoceptor  agents  shows  that  referencing  the  data  to  a  hy¬ 
drocarbon  environment  produces  more  comparative  insight  into  enthalpic  differences.  Differences  in  the 
enthalpies  of  binding  of  the  phenoxypropanolamine  derivatives  practolol  and  propranolol  are  largely  due  to 
loss  of  hydration  on  the  amidic  carbonyl  moiety  of  practolol.  Using  this  hydrocarbon  model  reference  state 
for  comparison,  major  differences  in  the  enthalpies  of  binding  of  the  amine  moiety  in  phenethanolamines 
and  phenoxypropanolamines  are  observed.  There  is  a  6-7  kcal  enthalpic  loss  in  substituting  a  methyl 
group  on  the  protonated  amine  moiety  of  noradrenaline,  and  a  further  similar  loss  of  6-7  kcal  in  substitut¬ 
ing  r-butyl  for  the  isopropyl  group.  In  contrast,  the  phenoxypropanolamine  derivatives  show  an  approxi¬ 
mately  constant  mode  of  binding  for  these  alkyl  substituents.  The  possibilities  that  the  amine  moiety  is 
sited  differently  in  phenethanolamine  and  phenoxypropanolamine  binding,  and  is  multiply  hydrogen 
bonded  to  three  receptor  sites  in  the  natural  hormone  are  explored.  The  identification  of  bioactive  conform¬ 
ed  in  intracellular  and  membrane-bound  receptor  agents  is  also  reviewed. 

Introduction 

The  pharmacological  concept  of  a  receptor  is  derived  from  the  highly  specific 
binding  exhibited  by  hormones  and  drug  molecules  (concentration  10~9  molar)  in 
given  sites  coupled  with  the  emission  or  competitive  inhibition  of  a  measurable  signal 
(response).  Dose-response  relationships  [1-4]  show  that  the  sites  in  many  cases  are 
singly  occupied  by  the  drug  molecules. 

Cellular  receptors  [5]  for  hormones  and  neurotransmitters  are  subdivided  into  those 
localized  within  the  cell  (within  the  nucleus  and  the  cytoplasm)  which  are  often  water 
soluble,  and  the  hydrophobic  cell  membrane  surface  receptors.  The  intracellular  hor¬ 
mones  exhibit  diverse  roles  as  may  be  seen  in  the  regulatory  morphological  role  of 
the  steroids,  androgen  and  estrogen  in  the  developmental  biology  of  the  sexes.  In 
contrast,  a  wide  variety  of  ligands  for  membrane-bound  receptors  —  the  peptide  and 
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glycoprotein  hormones,  the  biogenic  amines,  and  the  prostaglandins  are  associated 
with  coupling  to  the  enzyme  adenylate  cyclase.  Therapeutic  advantages  in  changes  in 
the  level  and  degree  of  hormone  regulation  are  exemplified  by  antisteroid  treatments 
in  prostate  and  breast  cancer,  and  by  selective  control  of  the  sympathetic  neurotrans¬ 
mitter  noradrenaline  on  the  adenylate  cyclase  coupled  /3-adrenoceptors,  with  the  ap¬ 
propriate  regulation  of  cardiac,  bronchial,  and  vascular  responses. 

Although  single-s'te  occupancy  may,  in  many  cases,  be  demonstrated  for  bound 
hormone  analogues  which  inhibit,  stimulate,  or  partially  stimulate  the  response  (in 
pharmacological  terms,  antagonists,  agonists,  and  partial  agonists,  respectively)  the 
molecular  occupancy  of  the  receptor  sites  is  essentially  statistical  in  character  and  the 
observed  binding  constant  may  represent  a  sum  of  individual  conformer  contribu¬ 
tions.  Thus,  more  than  one  conformer  may  bind  productively  to  the  receptor  while, 
on  the  other  hand,  even  within  one  identifiable  conformation,  restriction  on  the  frac¬ 
tion  of  molecules  eliciting  a  response  might  produce  a  population  balance  for  partial 
agonism.  Again,  a  bioactive  conformer  may  be  energetically  unfavorable  with  a 
minor  population  in  the  conformer  distribution  of  the  unbound  molecules,  with  resul¬ 
tant  loss  in  potency.  Binding  constants,  it  need  hardly  be  emphasized  are  free  energy- 
related  quantities. 

Thermodynamic  data  on  /3-adrenoceptor  binding  at  the  membrane  level  indicate 
major  differences  between  the  binding  of  agonists  and  antagonists.  Agonists  bind 
with  highly  favorable  enthalpies  but  with  large  negative  entropies  of  complex  forma¬ 
tion  [6-9].  Antagonists  show  relatively  weak  bonding  but  with  favorable  entropies 
for  receptor  binding.  More  quantitative  understanding  of  the  thermodynamic  changes 
involved  could  obviously  be  deduced  if  the  full  structures  of  the  receptor  were  known 
when  partition  functions  for  the  receptor  complex  might  be  written  and  thermo¬ 
dynamic  functions  computed.  However,  even  where  the  receptor  structures  are  not 
known,  comparative  binding  studies  should  yield  some  knowledge  of  the  receptor 
sites,  the  appropriate  cancellation  of  terms  of  the  partition  function  leaving  a  potential 
residue  of  analyzable  information. 

It  is  the  purpose  of  this  review  to  examine  what  structural  information  is  derivable 
from  drug-receptor  data  when  the  molecular  details  of  the  receptor  are  not  known, 
and  to  examine  the  viability  of  explicit  models  of  drug-receptor  interaction. 

Representational  Models  of  Receptor-Stimulus  Actions 

The  most  general  representational  model  for  relating  receptor  binding  and  stimulus 
action,  when  the  dose-response  relation  is  essentially  hyperbolic,  is  due  to  Black  and 
Leff  [10]. 

Where  the  law  of  mass  action  pertains  for  binding  of  the  drug  A  to  the  receptors  R, 
then  the  resultant  concentration  of  the  drug  AR  is  given  by  the  normal  hyperbolic 
relation 


[A*]  = 


*a[*]*r 
1  +  Ka[A] 


(1) 


where  Ka  is  the  binding  constant  of  the  drug  and  RT,  the  total  number  of  receptors. 
The  observed  stimulus  is  some  function  of  the  concentration  AR.  For  a  hyperbolic 
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function  to  exist  between  the  dose  A  and  the  stimulus  response,  two  possible  relations 
join  [A/?]  to  the  observed  effect.  The  effect  may  be  (1)  a  linear  function  or  (2)  a  hy¬ 
perbolic  function  of  [A/?]. 

Thus  if  the  effect  E  is  given  by 


Ke[AR] 

1  +  Ke[AR] 


(2) 


where  KE  represents  some  amplification  factor  of  the  signal,  then 


E 


1  +  (T  +  1  )Ka  [A] 


where  t  =  KeRt 


(3) 


or 


K 1  [a1 

E  =  *\+'~K'[A]  where  6  =  7+"7  211(1  Ka  =  (t  +  ^Ka  (4) 

Three  parameters,  the  binding  constant  XA ,  the  amplification  factor  Kt ,  and  the 
total  number  of  receptors  R T  define  the  cognitive  and  transducer  functions  of  the  drug 
upon  the  receptor.  To  relate  this  model  to  molecular  mechanisms  of  drug-receptor 
action,  it  is  necessary  to  consider  explicit  models  of  molecular  interaction. 

Molecular  Models  of  Receptor  Stimulus  Action. 

Some  Thermodynamic  Relations 

Since  only  a  fraction  of  drug  molecules  may  be  relevant  to  binding  to  the  receptor 
(and  only  a  fraction  of  the  receptor  conformations  may  be  relevant  to  binding  to  the 
drug)  it  is  convenient  to  represent  the  gross  binding  constant  in  a  conformer  represen¬ 
tation  [11]. 

In  terms  of  standard  partial  free  energies,  fi°  the  gross  binding  constant  K  may  be 
written  as: 

—  H°a  —  Hr  =  ~kT  log  K  (5) 

where  the  subscripts  AR,  A,  and  R  refer  to  the  complex,  drug,  and  receptor, 
respectively. 

Using  second  indices  to  identify  the  conformer  i  of  the  drug  A  engaged  in  binding, 
withy'  its  receptor  (R)  counterpart,  then,  for  the  ij“  conformer  interaction,  (1)  may 
be  written. 

HA/RjI  +  (Har  ~  Ha,Rj^  —  Ma,  —  (Ma  ~  A1  A,) 

-  Mv  -  (MS  -  Mv)  =  -kT  log  X  (6) 
Using  conformer  populations  /'  of  A,  and  f”  of  R,  and  the  relations 

Hi- ~  a- log  f 
Hr  ~  H°rjx  =  kT  log  f 
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and  summing  over  the  bound  states,  then 

2  2  K',xff‘x  =  *  <7> 

/  i* 

where  K‘!'  the  conformer  binding  constant,  is  given  by 

/<v  -  ~  Atv  =  ~kT  loi  (8> 

The  binding  constant  is  a  sum  of  the  conformer  binding  constants  weighted  by  their 
appropriate  conformer  fractions. 

It  is  more  convenient  to  define  the  conformer  binding  constant  K‘l '  referenced  to 
the  average  states  of  A  and  R. 

At;v  -  At;  -  At R=  -kT  log  K"'fr  =  -kT  log  K'L'  (9) 

The  observed  standard  free  energy  change  on  binding  can  also  be  written  as  a 
weighted  sum  over  the  bound  states  of  the  conformer  binding  constant  terms,  to¬ 
gether  with  the  associated  entropy  of  mixing 


A  G°  =  - RT X  I  AW  log  K'>lfr  +  RT2  2  AW  log  AW 

i  j*  i  j* 

=  -RT'Z  log  KV  +  RT2  2a V/  l°g  aw 

i  j x  I i  j x 

Differentiating  (9)  with  respect  to  temperature  and  using  the  relations 
Wogf)/8T  =  8(log/;‘)/6r  -  ~rt2  * 

Then 

/?7-2S(log  Kf  )/8T  =  A//q'  =  AH'J*  +  (H‘  -  H°)  +  (H,x  -  H°R/X) 
or  A Klx  =  H"x 

Using  the  standard  enthalpy  of-  the  bound  states 


(10) 

(11) 

(12) 


H%>  =  2  2aw"*’  (13) 

«  r 

the  standard  enthalpy  for  the  binding  becomes 


=  2  2awa//*' 


The  standard  entropy  change  on  binding  similarly  may  be  written 


AS^  =  2  2  -  2  2  Rp*,«,<  tag  AV,>  ( 15) 

i  /'  i 

It  may  be  convenient  to  consider  comparative  drug  binding  with  a  change  of  refer¬ 
ence,  to  a  hydrocarbon  lipid  phase,  L.  The  standard  free  energy  contribution  of  A  in 
(6)  can  then  be  written 


(1) 

(2) 

and  since 
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(  id) 

-m2m  -  (m2  -  m2,)  -  (m2,  -  m2 ,l) 

m2l  -  m2 ,l  =  log  ft. 
m2,  -  m2, I  =  kT  log  />' 

m2  -  m2  =  ^  log  P  (,7) 

where  f'L  is  the  conformer  fraction  of  i  in  a  nonaqueous  medium,  and  P  is  the  con- 
former  or  micropartition  coefficient  of  the  species  i  112),  it  follows  that 

(1)  2  2  KffJ'j’P  =K 

'  y’  (18) 

(2)  s  2Krrr;p'  =  * 

-  p 

These  relations  may  be  observed  from  the  free  energy  diagram  in  Figure  1  Appropri¬ 
ate  transformation  of  Eq.  (10)  may  be  similarly  applied.  The  utility  of  such  relations 
is  subsequently  examined,  but  it  may  be  observed  that  defined  sets  of  conditions  are 


G 


l 


Figure  1 .  Schematic  representation  of  free  energy  relations  for  the  conformer  i  of  drug  A 
interacting  with  the  relevant  receptor  conformer  j’  of  the  receptor  protein  complex  and 
possible  pathways  for  relating  the  bound  conformer  free  energy  GA,K/X  to  the  reference  free 

energy  GA 
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required  if  useful  correlations  in  comparative  drug  data  are  to  be  obtained.  Thus  in 
any  comparative  analogue  set  K'1’/1*  should  be  invariant.  A  more  detailed  discussion 
is  given  elsewhere  [13]. 


Bioactive  Conformer  Identification 

The  influence  of  conformer  fraction  on  the  binding  constant  of  a  drug  may  be  ex¬ 
emplified  by  axial-equatorial  preference  in  a  cyclic  aliphatic  ring  system.  Figure  2 
shows  the  relative  conformer  fraction  on  the  log,0  scale  plotted  against  the  energetic 
preference  of  the  two  conformers  attributable  to  an  anomeric  effect. 

An  obvious  corollary  from  the  figure  is  that  where  bond  energetics  appear  invariant 
to  changes  in  electron  distribution  about  the  bond,  the  conformation  is  already  en¬ 
ergetically  dominant.  Figure  3  shows  a  partial  interpretation  of  /3-adrenoceptor  antag¬ 
onist  action  where  a  dominant  conformation  about  the  -0-CH2-  moiety  must  exist. 
Table  1  shows  the  bond  rotational  energetics  for  the  -OCH3  group  attached  to  various 
aromatic  and  heterocyclic  ring  systems.  For  this  group  (but  cf,  N-CH,  [17]),  sto-3G 
minimal  basis  results  given  reasonable  agreement  compared  with  experiment  and 
with  better  basis  set  determinations.  For  /3-adrenoceptor  binding,  the  CH2  moiety 
must  lie  planar  with  the  aromatic  ring  [26]. 

We  have  found  sto-3G  calculations  to  give  useful  results  in  competitive  conformer 
preference  in  the  antiandrogen  anilide  derivatives  used  to  inhibit  the  male  hormone 
testosterone  in  the  treatment  of  prostate  cancer  (Fig.  4).  Table  II  shows  a  comparison 
of  predicted  and  experimental  conformer  populations.  Similar  results  were  obtained 


0.5-. 


Figure  2.  Relative  conformer  preference  for  an  axial /equatorial  substituent  orientation  in  a 
cyclic  aliphatic  ring  system.  The  conformer  fractions  on  the  log10  scale  are  plotted  against 
the  energetic  difference  for  the  two  conformers  attributable  to  an  anomeric  effect. 
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Dominant  conformation 
about  Ring-Oxygen 

Bond  lor  £- Antagonist 
Action 


Dominant  conformation 
about  Terminal  C  -  N 
Bond  lor  &-  Antagonist 
Action 


Small  steric  advantage 
-  'ot  t-Bu  over  IPr 

I  OCH^CHCH^NHCHtCHji^  - 
1  OH 

Small  steric  mtiuence 


Bonding  to 
Receptor 


Figure  3.  Partial  interpretation  of  the  action  of  p-acylamino  phenoxypropanolamine 
derivatives  on  the  cardiac  /3-adrenoceptor  1 13). 


with  a  4-3 1C  basis  set  on  a  reduced  molecule  (/?,,  R2  =  H).  Receptor  binding  corre¬ 
lates  with  conformer  1 ,  where  convenient  excitation  of  the  hydrogen  bond  proton 
donor  interaction  from  the  aromatic  ring  is  attainable  118]. 

When  the  fraction  of  the  biologically  active  conformer  is  only  minor,  on  the  other 
hand,  marked  sensitivity  to  the  size  of  the  fraction  pertains.  A  possible  example  is 
contained  in  data  on  analogues  of  the  central  nervous  system  agent  viloxazine 
(R  =  OC2H5,  Fig.  5)  which  inhibits  neuronal  reuptake  of  biogenic  amines. 

A  plot  of  potency  in  vivo  against  a  partitioning  effect  is  given  in  Figure  6.  If  the 
residual  variation  from  this  data  is  plotted  against  the  conformer  fraction  where  the 
side  chain  is  now  perpendicular  to  the  aromatic  ring  (Fig.  7)  then  a  simple  correlation 
is  observed. 

Such  interpretations  rely  on  a  single  conformer  of  the  drug  binding  productively  at 
the  receptor.  If  more  than  one  conformer  binds  to  the  receptor,  then  a  range  of  behav¬ 
ior  from  the  highly  stereospecific  to  the  nonspecific  may  occur.  Other  effects  are 
possible.  If  one  of  the  conformers  is  the  arbiter  of  stimulus  action,  partial  agonism 
may  result  as  a  competitive  occupancy  of  the  receptors  by  active  and  inactive 
conformers.  The  model  for  this  action  will  be  reviewed,  but  it  is  first  necessary  to 
consider  the  choice  of  reference  state  for  comparing  drug  data  in  *he  case  of 
membrane-bound  receptors. 


Thermodynamic  Parameters  of  Ligand  Binding  to  the  /3-Adrenergic  Receptor. 
Choice  of  Reference  Phase  for  Comparative  Data 

As  seen  from  the  previous  example  of  viloxazine  data,  more  stringent  conditions 
are  required  to  identify  a  bioactive  conformer  when  more  than  one  conformer  may  be 
acting  upon  the  hormone  receptors.  Bound  conformations  may,  of  course,  be  inferred 
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Table  I 


Conformational  Concentration 

Energy  Preference  Difference 

Planar/Perpendicular  37°C 


1.2  kcal* 
(sto-3G) 

0.7  kcal 
(4-31G)b 


0.0  kcal 
(sto-3G) 


5:1 


I :  I 


2.0  kcal 
(sto-3G) 


25:1 


f 


H, 


1.3  kcaP 

Synperiplanar  to  H 
Relative  to  Antiperiplanar 
(in  CDC1,,  nmr) 


1.2  kcal  "(4-3101 
SYNPERIPLANAR 
to  H2  or  L' 
ANTIPERIPLANAR 
less  favored 


8:1 


7:7:1 


‘From  Ref.  14. 
"From  Ref.  15. 
'From  Ref.  16. 
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Figure  4.  The  steroid  androgen  testosterone,  and  the  antagonist  anilide  used  in  the  treat¬ 
ment  of  prostate  cancer. 


Table  II 


H 


Relative  conformer  Relative  conformer 

populations  F,  on  the  population  from  infrared* 
STO-3G  logio  scale  at  310°K  measurements 

Ri  R2  A£(kcal)  log  10(F,/£:)  log10</r,/£r2) 


CH, 

CH, 

+  1.5 

+ 1 .05 

>1.0 

CH, 

CF, 

-0.14 

-0.10 

-0.18 

CF, 

CF, 

-1.8 

-1.3 

Anion  at  physiological  pH 

log  Potency 
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R 

2  OCh^  Goes  Not  Release  Catecholamine  From 

Neuronal  Stores.  Inhibits  Reuptake. 

J  OCH^  Releases  Catecholamine.  Effective 

Reuptake. 

3  0C2H5  Does  Not  Release  Catecholamine. 

Figure  5.  Some  properties  of  closely  related  analogues  of  the  central  nervous  system 
agent,  viloxazine  (R  =  20C2H,)  used  to  inhibit  biogenic  amine  reuptake. 


Figure  6.  Potency  in  vivo  of  viloxazine  analogues  plotted  against  a  partitioning  effect  us¬ 
ing  the  octanol/water  model  on  the  log,0  scale. 
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Figure  7.  Residua)  variation  in  potency  of  viloxazine  analogues  after  allowance  for  a  parti¬ 
tioning  effect  plotted  on  the  log,,,  scale  against  the  fraction  of  the  conformers  having  the  side 
chain  perpendicular  to  the  ring. 


from  fixed  conformation  active  isomers,  but  the  choice  of  reference  state  for  compar¬ 
ing  drug  data  is  crucial  for  correct  identification  of  active  conformers  in  flexible 
molecules. 

Previous  evidence  has  shown  that  the  phase  environment  in  given  regions  around 
the  bound  /3-adrenoceptor  antagonist  and  partial  agonist  molecules  is  hydrophobic  in 
character,  and  that  modelling  of  substituent  effects  in  these  regions  with  an  effective 
hydrocarbon  solvent  model  produces  correlations  of  unit  slope  [13],  Even  for  alkyl 
substitution  on  the  protonated  amine  moiety,  where  the  alkyl  hydrogen  atoms  are  rel¬ 
atively  acidic,  a  simple  correlation  appears  evident  in  phenoxypropanolamine  deriva¬ 
tives  (Fig.  8).  However,  the  limit  of  the  environment  in  the  region  of  the  bound 
protonated  amine  moiety  with  its  potential  for  attendant  water  molecules  is  unknown 
in  this  charged  species  and  an  aqueous  rather  than  a  hydrophobic  medium  might  still 
be  the  most  relevant  for  modelling  this  region  of  the  bound  molecule,  apart  from  the 
need  to  site  receptor  counter  ions.  The  alternative  forms  of  Eq.  (18)  allow  scope  for 
incorporating  such  potential  heterogeneity,  provided  that  the  partition  coefficient  for 
the  relevant  conformer  can  be  estimated  satisfactorily. 

It  should  thus,  be  informative  to  remove  the  intrinsic  variation  in  the  data  due  to 
hydrophobic  interactions  by  either  referencing  the  data  to  a  theoretical  hydrocarbon 
environment,  or  by  use  of  substituent  corrections  suggested  by  Eq.  (18)  in  order  that 
a  clearer  picture  of  the  factors  giving  rise  to  the  differences  in  the  AH's  might  be 
identified. 
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Figure  8.  The  influence  of  alkyi  substitution  on  the  amine  moiety  in  propranolol  ana¬ 
logues.  Potencies  on  the  /3-adrenoceptor  in  isolated  central  nervous  system  membranes  (I9J 
plotted  against  a  simple  partitioning  effect  on  the  log>0  scale  using  the  iso-amyl  acetate/ 

water  model. 


Thermodynamic  data  on  ligand  binding  to  the  /8-adrenergic  receptor  of  turkey  ery¬ 
throcytes  is  given  in  Table  III  [6],  A  key  of  the  ligands  which  comprise  phene- 
thanolamine  and  phenoxypropanolamine  derivatives  is  in  Table  IV.  Agonists  bind 
with  large  negative  entropies  while  antagonists  show  relatively  weak  AH’s  but  favor¬ 
able  entropies  for  complex  formation.  Partial  agonists  show  an  intermediate  behavior. 

While  bound  conformations  of  phenethanolamines  have  been  deduced  from  fixed 
conformation  active  isomers  [20],  the  active  conformers  of  phenoxypropanolamines 
upon  the  /8-adrenoceptor  are  less  certain  [21-23].  To  attempt  to  identify  the  bound 
active  conformers  using  Eq.  (18),  the  requisite  thermodynamic  model  for  accounting 
for  the  hydrophobic  interactions  may  be  considered. 
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Table  III.  Thermodynamic  parameters  of  ligand  binding  to  the  ^-adrenergic  receptor 
of  turkey  erythrocytes  at  37°C.* 


AG° 

(kcal  mol  ') 

AH“ 

(kcal  moP1) 

AS° 

(entropy  units) 

Agonists 

( -  lisoprenaline 

-9.39 

-13.39 

-12.9 

( -  (noradrenaline 

-7.91 

-18.86 

-35.3 

( -  (adrenaline 

-7.50 

— 12.75 

-16.9 

Partial  agonists 

Soterenol 

-8.23 

-7.84 

+  1.26 

Metaproterenol 

-6.78 

-10.83 

-13.06 

Terbutaline 

-6.19 

-4.13 

+6.65 

Antagonists 

( -  (propranolol 

-12.51 

-3.85 

+  27.9 

Pindolol 

-11.85 

-5.08 

+  21.8 

Zinterol 

-9.13 

-3.06 

+  19.6 

Metoprolo! 

-8.36 

-0.66 

+  24.8 

Sotalol 

-8.21 

-2.15 

+  19.5 

Practolol 

-7.46 

+  3  92 

-36  7 

‘From  Ref.  6. 


The  thermodynamics  of  transfer  of  the  -CH2-group  into  cyclohexane  and  water  are 
given  in  Table  V.  The  two  processes  are  enthalpic  and  entropic  controlled  processes, 
respectively,  but  the  transfer  between  the  two  phases  produces  net  changes  of  particu¬ 
lar  simplicity,  the  favorable  8AG  having  approximately  equal  enthalpic  and  entropic 
contributions.  The  position  holds  for  a  number  of  nonaqueous  solvents  (24).  This  re¬ 
lationship  will  be  utilized  for  estimating  nonpolar  substituent  transfer  between  water 
and  a  hydrophobic  environment. 

For  thermodynamic  data  on  transfer  of  polar  moieties  from  water  to  a  nonaqueous 
environment,  partition  data  using  a  suitable  hydrocarbon  solvent  model  may  similarly 
be  made  [25].  In  the  case  of  hydrogen  bond  interaction  with  the  receptor,  the  most 
suitable  model  may  be  to  mimic  the  hydrogen  bond  interaction  with  a  suitable  solvent 
of  appropriate  bond  strength,  so  that  nonspecific  effects  of  other  groups  may  be  rea¬ 
sonably  estimated.  In  the  case  of  practolol  where  hydrogen  bond  proton  donor  inter¬ 
action  with  the  receptor  from  the  p-acylamino  group  has  been  demonstrated,  a 
suitable  model  for  phase  transfer  is  a  long-chain  ester,  the  receptor  bonding  of  the 
proton  donor  being  of  similar  strength  on  the  free  energy  scale. 

Tables  VI  and  VII  show  predictions  of  adrenaline  from  isoprenaline  and  of  propra¬ 
nolol  from  practolol  allowing  for  simple  phase  transfer  of  the  substituent  groups.  Self 
consistent  results  are  obtained,  the  observed  AW  for  practolol  and  propranolol  being 
dominated  by  loss  of  hydration  on  the  practolol  amidic  carbonyl  moiety.  The  ob¬ 
served  free  energy  difference  referenced  to  the  nonaqueous  ester  model  is  0.5- 
0.6  kcal  at  37°C. 


0CH2CH0HCH2NHCH(CHj)2 
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Table  V  Incremental  thermodynamics  of  partitioning  of  the  -CH.-group‘ 


Partitioning 

Phases 

SAG 

310°K 

Kcal 

SAW  -TSAS 

1.  Cyclohexane/gas 

-0.76 

-1.12 

+  0.36 

2.  HjO/gas 

+0.18 

-0.67 

+0.85 

Cyclohexane/H20 

-0.94 

-0.45 

-0.49 

'From  Ref.  24 

Table  VI  Thermodynamic  changes  on  alkylamino  substitution  in  phenethanolamines 

3I0°K 

Kcal 

AG° 

A  H° 

-TAS° 

tsoprenaline 

-9.39 

-13.39 

-4.00 

StCHlCHjh— CHj)' 

-1.70 

-0.85 

-0.85 

Adrenaline  prediction 

-7.69 

-12.54 

+4.85 

Experiment 

-7.50 

-12.75 

+  5.24 

Error 

+0.19 

-0.21 

+0.39 

■Group  contribution  based  on  cyclohexane/H,0  partitioning 


Where  self-consistency  in  prediction  is  not  obtained,  comparative  differences 
should  highlight  differences  in  binding  requirements  of  the  ligand  to  the  receptor. 
Table  VIII  shows  differences  affecting  progressive  alkyl  substitution  on  the  amine 
moiety  in  phenethanolamines.  There  is  an  enthalpic  loss  of  6.5  kcal  in  going  from 
noradrenaline  to  adrenaline,  and  a  further  loss  of  6.5  kcal  in  going  from  tsopro- 
pylamino  to  t-butylamino  substitution.  The  free  energy  losses  are  also  similar  of 
~1.5  kcal.  This  behavior  is  in  striking  contrast  to  phenoxypropanolamine  deriva¬ 
tives  (Fig.  8),  where  the  moieties  H,  CH3,  CH(CH3)2,  C(CH,),  have  little  effect  upon 
the  potency  when  referenced  to  a  hydrocarbon  environment. 

Table  IX  shows  an  approximate  estimate  of  differences  in  bonding  when  predicting 
a  phenethanolamine  derivative  from  phenoxypropanolamine  data.  While  the  predic¬ 
tions  require  experimental  verification,  the  deviation  is  of  a  similar  order,  giving 
8  AH  of  6-7  kcal. 

One  possibility  is  that  the  amine  moiety  is  sited  differently  in  phenethanolamine 
and  phenoxypropanolamine  binding,  and  is  multiply  hydrogen  bonded  to  three  recep¬ 
tor  sites  in  the  natural  hormone.  Phenoxypropanolamines  in  contrast  would  appear  to 
be  dominantly  singly  bonded  to  the  receptor.  If  the  agonist  activity  is  primarily  - 
tributable  to  the  siting  of  the  amine  moiety,  then  this  group  would  be  sited  differently 
for  antagonist  action  in  phenoxypropanolamines.  A  possible  hypothesis  is  given  in 
Figure  9  where  phenoxypropanolamines  are  compared  to  a  Fixed  agonist  isomer  of  a 
phenethanolamine  (20J.  The  alternative  hypothesis  is  that  similar  conformations  exist 
for  antagonist  and  agonist  action,  and  only  a  subset  of  the  conformer  space  is  avail¬ 
able  for  triggering  agonist  activity.  Current  work  on  the  problem  may  be  examined. 
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Table  VII.  Self-consistency  of  thermodynamic  changes  in  Pbenoxybropanolamines. 


310°K  kcal 

A  G°  A H°  -7A  S° 


Practolol 

-p-NHCOCHj 


Entropy  correction  for 
specific-  NHCO  conformation 


OCHjCHOHCH.NHR, 

.i, 


Q 


Prediction  (2) 


-7.46 

-2.72 


-10.2 


0.42 


-10.6 


+  3.92 
-6.6* 


-2.7 


-2.7 


-11.40 
+  3.9* 


-7.5 


0.42 


-7.9 


-2.06  -1.03  -1.03 


OCHjCHOHCH] NHR , 


-3.7 

-3.85 


-8.9 

-8.65 


'Estimate. 

R,  =  CH(CH,)2. 


Molecular  Models  of  Stimulus  Action  and  the  Black  and  Leff  Operational 

Model 

Using  the  definitions  of  Sections  2  and  3,  and  under  given  conditions,  the  effi¬ 
ciency  of  a  partial  agonist  es  can  be  written  [  1  ] 

I  Kb 

where  K u*  is  the  conformer  binding  constant  of  the  agonist  conformer  (the  assumption 
is  made  that  a  single  conformer  controls  agonist  activity),  and  Ks  is  the  binding  con¬ 
stant  of  the  drug  B. 
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Table  VIII  Thermodynamic  changes  on  alkylamino  substitution  in  phenethanolamines  II. 


310°K 

AG° 

kcal 

AH° 

-TAS° 

a)  Noradrenaline - »  Adrenaline 

Noradrenaline 

-7.91 

-18.86 

+  10.9 

+8(CH,  —  H) 

-0.84 

-0.42 

-0.42 

Adrenaline  Prediction 

-8.7 

-19.3 

+  10.5 

Experiment 

-7.5 

-12.75 

+  5.2 

Aa 

+  1.2 

+6.5 

-5.3 

b)  Metaproterenol - »  Terbutaline 

Metaproterenol 

-6.78 

- 10.83 

+  4.05 

+8(CHj  —  H) 

-0.86 

-0.43 

-0.43 

Terbutaline  Prediction 

-7.64 

-11.26 

+  3.62 

Experiment 

-6.19 

-4.13 

-2.06 

Ab 

+  1.5 

+  7.1 

-5.7 

c)  Soterenol - *  Zinterol 

Soterenol 

-8.23 

-7.84 

-0.39 

+8(CH2d>  —  H) 

-3.43 

-1.72 

-1.72 

Zinterol  Prediction 

-11.86 

-9.56 

-2.11 

Experiment 

-9.13 

-3.06 

-6.08 

Ar 

+  2.7 

+6.5 

-4.0 

If  the  fraction  of  receptor  conformers  f1'  involved  in  agonist  binding  is  approxi¬ 
mately  constant  for  the  close  set  of  drug  analogues  under  comparison,  Eq.  (19)  may 
be  written  on  the  logarithmic  scale 

log  Kb  +  log  - log  Tg  =  log  Kl'  +  log/' 

1  eB 

where  K'i'  =  K'1’/1'  (20) 

Using  a  linear  model,  the  efficacy  term  reduces  to  log  eB.  For  a  correlation  to  be 
observed  between  the  agonist  effect,  and  the  conformer  fraction  f  of  the  drugs,  a 
stringent  set  of  conditions  is  required.  The  drug  analogues  must  all  bind  in  the  same 
way  ( Kg  constant),  and  create  a  stimulant  effect  under  equivalent  conditions  (K" .  r„ 
constant). 

One  way  in  which  such  conditions  might  be  attained  is  by  entropic  restriction  in  a 
set  of  close  analogues.  A  possible  example  is  given  in  the  series  of  /3-adrenoceptor 
agents  in  Table  X  126].  These  phenoxypropanolamine  derivatives  which  show  identi¬ 
cal  binding  constants  when  referenced  to  a  nonpolar  hydrocarbon  environment  have  a 
stimulus  effect  controlled  by  the  effective  size  of  the  ortho  substituent.  The  data  are 
based  on  a  sample  of  four  rats,  the  maximal  incremental  effect  on  heart  rate  due  to 
/3-adrenoceptor  stimulation  being  ~230beats/minute.  It  may  be  noted  that  the  change 
in  electron  distribution  in  the  derivatives  R  =  H,  OCH,,  CH,,  C;H5  are  relatively 
minor  compared  to  the  variation  in  stimulus  effect. 
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Table  IX  Differences  between  phenethanolamines  and  phenoxypropanolamines. 


-8(OCH,(Aryl) —  HY 


Prediction  ( 1 ) 


-0.17 

-10.8 


+0.18 

-2.5 


-0.34 

-8.2 


CHOMCMjHHR, 


HHSOjCHj 

-8(NHS02CH, —  H)  -2.45 

CH0HCH,NHR, 


Prediction  (2)  - 10,7 

8(2)- (I)  0.1 


-2.15 


-5.94“ 


-8,09 
+  5.6 


-6.05 


+  3.5“ 


-2.55 

-5.6 


'  (Hydrocarbon/H 20  model) . 
“estimate, 
ft,  =  CH(CH3)2. 


The  possible  conformations  of  the  phenoxypropanolamines  are  shown  in  Fig¬ 
ure  10.  The  intramolecular  hydrogen-bonded  species  I,  IV,  and  II,  V  have  the  amine 
moiety  quite  closely  coincident  with  the  expected  position  found  in  phenethanolamine 
derivatives.  Table  XI  shows  conformer  populations  of  the  protonated  species  in  D:0 
(11).  A  potential  candidate  as  the  arbiter  of  the  stimulus  effect  is  conformer  1.  IV,  the 
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Figure  9.  Comparison  of  a  fixed  active  isomer  of  a  phenethanolamine  [20]  with  phe- 
noxypropanolamine  conformers.  The  dotted  line  of  the  phenoxypropanolamine  eonformer 
denotes  siting  of  the  amine  moiety  closely  coincident  to  the  position  in  phenethanolamines. 
and  a  potential  stimulant  eonformer. 


Table  X 


R  Substituent 

Intrinsic  sympathomimetic  activity, 
heart  rate  (beats/min) 

H 

104  +  7 

CHj 

65  ±  11 

OCHj 

101  ±  7 

C2H, 

29  ±  7 

CH2CH:CH2 

32  ±  5 

COCH, 

74  ±  5 

OCH2CH:CH2 

56  ±  5 

CF, 

90  ±  5 

no2 

66  ±  5 

F 

117  ±  2 
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x,  vt  n,  vi  as,  vi 


Figure  10.  Primary  conformers  of  phenoxvpropanolamincs. 


Tabi.p.  XI 

Populalions  in  D.O/DCI  INMRl 


F 

H 

CH. 

CH;CH:CH; 

C.H.i  1 ) 

C.H.(  2 1 

PL  IV 

0.10 

0.15 

0.07 

0.07 

0.05 

0.07 

PL  V! 

0.45 

0.39 

0.46 

0.45 

0.49 

0.46 

PIL  V 

0.12 

0.01 

0.12 

0.09 

0.09 

0.06 

P  II.  VI 

0.02 

0.11 

0.03 

0.09 

0.03 

0.09 

P  III.  VI 

0.34 

0.34 

0.33 

0.31 

0.33 

0.32 
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likely  contender  for  the  antagonist  conformer  being  1,  VI.  which  has  an  approxi¬ 
mately  invariant  conformer  population  within  the  set  of  analogues.  This  latter  con- 
former  has  previously  been  proposed  from  X-ray  studies  on  phenethanolamine  and 
phenoxypropanolamine  derivatives  [21 J. 

Individual  conformer  populations  were  evaluated,  taking  into  account  the  coupling 
between  the  dihedral  angles  due  to  the  intramolecular  hydrogen  bonding,  utilizing  the 
five  most  probable  conformers  [27].  Even  so,  variations  between  conformer  popula¬ 
tions  assuming  dihedral  coupling  or  independence  were  quite  minor.  Most  variation 
was  shown  in  the  coupling  constant  for  the  R  =  C}H,  derivative,  and  these  measure¬ 
ments  are  shown  separately  in  Table  XI. 

The  model  for  evaluating  conformer  populations  is  not.  however,  complete  for  cor¬ 
relation  studies  on  the  binding  constants  of  this  set  of  analogues,  as  stated  above, 
shows  that  the  ortho  substituents  are  surrounded  by  hydrophobic  interactions  in  these 
drug-receptor  complexes.  From  Eq.  (18),  the  relevant  conformer  fractions  in  aqueous 
and  nonaqueous  phases  (12)  are  given  by  the  relation 

f[P  =f'P‘ 

and  the  ratio  of  the  gross  ( P )  to  conformer  partition  coefficient  (P1)  is  required.  In  the 
intramolecular  hydrogen-bonded  species  I.  IV,  the  conformer  partition  coefficient  P‘ 
is  sensitive  to  electronic  changes  in  ortho  substitution  with  the  direct  electrostatic  ef¬ 
fect  on  the  protonated  amine  moiety  and  would  require  experimental  model  isomers 
for  estimation.  We  have,  therefore,  considered  it  worthwhile  to  evaluate  f‘,  theoreti¬ 
cally  by  computing  the  intrinsic  relative  strengths  of  the  intramolecular  hydrogen 
bond  I,  IV  on  the  free  energy  scale.  Preliminary  studies  on  the  problem  evaluating 
the  partition  function  by  direct  integration  techniques  on  intermolecular  hydrogen 
bonding  in  sterically  hindered  phenols  showed  the  methodology  to  be  feasible  [28], 

7,  Conclusions 

The  ability  to  derive  useful  structural  information  from  data  on  drug-receptor  inter¬ 
actions  when  details  of  the  receptor-protein  complex  are  not  known  has  been  exemp¬ 
lified  in  intracellular  and  cell  membrane  receptor  applications. 

The  representational  model  of  a  binding  constant  as  a  sum  of  individual  conformer 
binding  constants  weighted  by  the  relevant  conformer  fractions  of  drug  and  receptor 
highlights  one  of  the  difficulties  in  working  with  partial  information.  An  inherent  dif¬ 
ficulty  is  that  the  relevant  conformer  jx  of  the  receptor  interacting  with  the  i'h  con¬ 
former  of  the  drug  is  unknown.  Thus  in  maximizing  the  effective  binding  by 
increasing  the  conformer  binding  constant  K°'  and  the  relevant  conformer  fraction  f 
of  the  drug,  no  information  is  gained  on  increasing  the  relevant  receptor  conformer 
A  random  as  opposed  to  a  logical  change  in  structure  of  the  drug  might  prove 
more  effective  by  increasing  f  \  and  the  development  of  an  improved  structure.  It  is 
at  least  logical  in  the  design  of  a  drug  to  increase  K"'  an df  as  efficiently  as  possible, 
in  order  that  the  number  of  random  changes  that  may  be  introduced  can  be  maxi¬ 
mized  for  a  given  effort.  On  the  more  optimistic  side,  it  is  of  course  possible  to  bring 
a  potential  drug  compound  to  within  therapeutic  limits  by  the  use  of  such  models. 
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The  methodologies  for  doing  so  to  achieve  quantitative  predictions  appear  reasonably 
established. 
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Abstract 

We  consider  the  problem  of  locating  an  active  fragment  (substructure)  common  to  a  class  of  biologically 
active  compounds  and  presumed  responsible  for  their  biological  activity  (therapeutic  or  toxic).  Our  ap¬ 
proach  is  graph-theoretical  in  that  molecules  are  represented  by  suitable  graph-theoretical  invariants.  Spe¬ 
cially  weighted  paths  in  the  molecular  graph  are  adopted  as  descriptive  elements.  By  selecting  different 
sets  of  atoms  one  searches  for  a  fragment  that  best  represents  the  relative  activities  of  the  compounds.  As 
an  illustration  we  consider  a  dozen  nitrosamine  mutagens  and  analyze  the  cases  of  five-,  six-,  and  seven- 
atom  fragments.  The  approach  clearly  indicates  that  a  specific  seven-atom  fragment  (for  molecules  with  up 
to  II  nonhydrogen  atoms)  can  account  for  the  relative  mutagenic  activities  of  the  nitrosamines  considered. 

Introduction 

The  importance  of  structure-activity  relationships  (SARs)  is  generally  accepted 
today,  and  considerable  effort  is  now  aimed  at  elaborating  the  developing  schemes 
which  can  lead  to  quantitative  characterizations  and  predictions  of  activity.  The  key 
to  the  SAR  problem  has  always  been  the  characterization  of  chemical  structure.  For 
convenience,  existing  approaches  to  the  problem  may  be  classified  as:  (a)  structure 
cryptic,  namely  approaches  in  which  limited  structural  information  is  explicitly  used; 
(b)  structure  implicit,  schemes  such  as  quantum  mechanical  models,  which  yield 
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information  not  directly  interpretable  in  terms  of  structural  components;  and  (c)  struc¬ 
ture  explicit;  schemes  which  make  sole  use  of  structural  information  in  the  compari¬ 
son  of  results  [  1  ] .  Graph-theoretical  approaches  are  all  structure  explicit  because  one 
starts  by  selecting  appropriate  structural  (graph-theoretical)  invariants  and  then  uses 
these  in  making  comparative  studies.  Here  we  continue  to  explore  and  develop  graph- 
theoretic  methodology  and  will  show  not  only  how  selected  graph  invariants  can  be 
used  to  represent  molecules  (chemicals),  but  also  how  they  can  be  employed  for  the 
characterization  of  local  molecular  features.  In  particular,  we  focus  attention  on  the 
search  for  substructures  that  can  be  identified  as  responsible  for  molecular  activity  in 
families  of  closely  related  structures. 

A  number  of  active  molecular  fragments  have  been  identified  empirically.  For 
example,  the  so-called  “morphine”  rule  [2]  defines  a  fragment  found  in  various 
morphine-related  molecules,  and  similarly  dopamines  appear  to  have  a  common, 
characteristic  grouping  of  atoms.  Inspection  of  molecular  diagrams  sometimes  sug¬ 
gests  such  fragments  and  visual  verification  suffices  to  establish  them.  However,  in 
other  instances,  the  fragments  may  be  unknown  or  largely  obscured,  and  in  need  of 
identification.  We  now  outline  a  graph-theoretical  procedure  that  identifies  potentially 
active  molecular  fragments  or  substructures  within  families  of  structurally  related 
compounds  displaying  similar  therapeutic  (toxic)  features.  In  the  next  section  we  first 
discuss  the  graph-theoretical  representation  of  structures  adopted  here,  and  then  fol¬ 
low  this  with  an  illustration  of  the  search  procedure  for  an  active  substructure  in  a  set 
of  nitrosamine  mutagens. 

Graph-Theoretical  Representation  of  a  Structure 

Mathematical  problems  are  frequently  solved  by  adopting  suitable  coordinates,  and 
sometimes  the  crucial  step  in  solving  a  problem  involves  the  proper  choice  of  coordi¬ 
nates.  Conversely,  a  poor  selection  of  coordinates  can  lead  to  tedious  calculations  and 
frustration.  A  prudent  selection  of  graph  invariants  can  similarly  facilitate  structure 
activity  analyses,  though  even  these  analyses  can  become  time-consuming  and  unpro¬ 
ductive  if  the  molecules  studied  are  represented  by  inappropriate  descriptors.  Experi¬ 
ence  suggests  that  certain  graph  invariants  are  likely  to  be  useful  representatives  of 
molecular  structure.  In  particular,  we  may  mention  the  Wiener  number,  W,  which 
represents  the  count  of  all  paths  in  a  structure  [3);  the  Hosoya  number,  Z,  which  totals 
the  number  of  nonadjacent  bonds  in  a  structure,  and  as  such  represents  the  first 
graph-theoretical  index  designed  to  represent  a  structure  by  a  single  number  |4);  and. 
finally,  the  connectivity  index,  \  [51,  introduced  by  one  of  the  present  authors  [6]  as 
a  bond-additive  quantity  which  differentiates  contributions  from  bonds  of  different 
type.  More  recently,  additional  descriptors  were  introduced,  for  example,  the  molec¬ 
ular  identification  number  which  represents  the  sum  of  suitably  weighted  molecular 
paths  [7],  This  number  is  analogous  to  the  Wiener  number,  but  employs  weights 
shown  to  be  useful  in  applications  of  the  connectivity  index,  \- 

The  limitations  of  a  single-parameter  representation  of  structure  are  obvious,  even 
to  those  not  familiar  with  the  field  of  structure-property  studies.  What  is  surprising, 
and  sometimes  overlooked,  however,  is  how  much  structural  information  can  often  be 
condensed  into  a  single  number.  Nonetheless,  if  one  wants  to  go  beyond  a  certain 


ACTIVE  SUBSTRUCTURES  IN  STRUCTURE-ACTIVITY  STUDIES 


247 


level,  the  next  logical  step  is  to  consider  several  parameters  (invariants).  The  first 
issue  to  be  settled  here  is  the  selection  of  the  invariants.  That  atomic  and  bond  contri¬ 
butions  play  a  dominant  role  in  most  applications  is  self-evident,  but  what  should  the 
next  contribution  be?  Nonbonded  interactions?  Next  nearest  neighbors?  Bond-bond 
interactions?  While  each  of  these  plays  a  significant  role  to  a  differing  extent,  such 
individual  improvements  suffer  from  the  same  defect,  namely  that,  if  they  cannot 
account  for  all  the  deviations  observed,  the  question  again  arises  as  to  what  should  be 
considered  next.  If  one  wishes  to  develop  a  systematic  multiparametric  approach  to 
SAR,  two  basic  possibilities  present  themselves:  one  considers  either  an  ordered 
sequence  of  numbers,  or  a  collection  (a  set)  of  numbers  (parameters).  Each  choice 
has  certain  advantages,  and  we  now  examine  these  two  alternatives. 

However,  even  when  one  considers  such  a  route,  we  recommend  that  the  invariants 
(parameters)  be  naturally  ordered,  so  that  one  can  successively  introduce  higher 
terms  as  needed  and  eliminate  those  terms  that  are  not  warranted.  One  of  the  advan¬ 
tages  of  the  connectivity  index  is  that  it  allows  such  a  natural  extension  [8].  One  can 
consider  two-bond  fragments,  three-bond  fragments,  and  so  on,  and  introduce  for 
each  the  corresponding  (higher)  connectivity  value.  These  indices  can  subsequently 
be  used  as  a  sequence  or  set  of  parameters  to  improve  the  correlations.  Such  a  gener¬ 
alization  of  an  index  is  lacking  in  the  case  of  both  the  Hosoya,  Z,  and  Wiener,  W, 
numbers,  as  pointed  out  in  the  literature  (8). 

More  generally,  one  can  select  paths  of  different  length  and  use  path  numbers,  pk. 
(which  designate  the  number  of  paths  of  length  *  in  a  molecule)  as  molecular  descrip¬ 
tors  [9].  Although  the  suggestion  that  path  numbers  may  be  useful  molecular  descrip¬ 
tors  can  be  traced  back  to  the  work  of  Platt  ( 10],  it  was  not  widely  recognized  or 
adopted.  Paths  are  defined  as  self-avoiding  walks,  that  is,  as  sequences  of  vertices 
and  incident  edges,  that  cannot  be  repeated.  There  are  alternative  ways  of  character¬ 
izing  structure,  for  instance,  the  use  of  self-returning  or  random  walks  rather  than 
paths.  A  random  walk  on  the  other  hand  allows  one  to  use  the  same  edge  and  vertex 
several  times.  An  advantage  of  random  walks  (and  self-returning  random  walks, 
which  are  walks  that  start  and  end  on  the  same  atom)  is  that  they  are  easy  to  com¬ 
pute — they  are  given  by  the  entries  of  Ak  (the  Ar-th  power  of  the  adjacency  matrix)  — 
but  they  tend  to  grow  rapidly  with  molecular  size.  In  contrast,  path  numbers  do 
not  grow  so  fast,  though  their  computation  (except  in  the  case  of  very  small  and  sim¬ 
ple  graphs)  is  tedious  and  requires  computer  use.  A  program  called  ALLPATH  is 
available  [  10]  and  gives  results  for  most  molecules  of  chemical  interest  rapidly.  The 
problem  of  a  path  count  is  inherently  NP-hard,  that  is,  it  grows  exponentially  (non- 
polynomially,  hence  the  abbreviation  NP)  with  the  growth  of  the  size  of  the  graph 
(measured  in  terms  of  the  number  of  vertices,  n).  Thus,  in  principle,  for  sufficiently 
large  graphs,  a  path  count  is  bound  to  become  impractical.  However,  for  molecules 
having  up  to  a  half  a  dozen  rings  and  some  50-150  atoms  (which  covers  a  lot  of 
chemistry!),  the  count  is  usually  relatively  fast.  Accordingly,  we  shall  continue  to  use 
path  numbers  as  our  basic  approach  to  structure  description.  The  following  character¬ 
istics  suggest  the  virtues  of  path  numbers: 

Apparently  (pictorially)  similar  structures  lead  to  path  sequences  that  are  visually 

and  analytically  similar  [11]; 
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Naturally  occurring  compounds  (such  as  the  terpenes)  show  greater  mutual  simi¬ 
larity  than  they  do  to  artificially  constructed  structures  having  the  same  building 
blocks  (isoprene  units)  [11]; 

Structures  of  similar  biological  activity  show  similarities  when  ordered  according 
to  selected  (most  active)  standards  [12, 13]. 


There  are  also  certain  drawbacks  to  the  use  of  path  numbers  for  the  characteriza¬ 
tion  of  molecules.  If  molecules  have  a  similar  outside  shape  (periphery),  but  differ 
significantly  in  their  internal  structure  (number  of  rings  or  bridges),  they  will  have 
vastly  different  counts  of  paths,  even  though  they  possess  similar  properties.  An  ex¬ 
ample  is  provided  by  civetone,  a  macrocyclic  musk  and  a  sterol  that  possesses  a  de¬ 
cidedly  musk-like  odor  as  observed  by  Prelog  and  Ruiicka  [14], 


Both  have  a  similar  periphery  and  a  similar  musk  odor,  but  this  similarity  is  not  de¬ 
tectable  using  an  approach  based  on  count  of  paths. 

This  kind  of  problem  prompted  the  need  to  introduce  weighting  factors,  which  can 
reduce  the  role  of  some  paths  and  thereby  enhance  the  significance  of  others.  Com¬ 
parison  of  selected  physicochemical  properties  of  alkanes  has  shown  that  short  paths, 
namely  paths  of  lengths  two  and  three  ( p2  and  p3,  respectively),  play  a  dominant  role 
in  dictating  the  relative  magnitudes  and  variations  in  properties  among  isomers  1 15- 
17].  The  connectivity  index,  which  focuses  on  paths  of  length  one  (p,  or  bonds),  has 
been  shown  to  be  successful  in  correlating  structure  with  property  in  the  form  of  a 
single-number  representation  [18,  19].  This  index  is  based  on  special  weightings: 
bonds  are  classified  as  (m,n)  bond  types,  where  m  and  n  indicate  the  number  of 
neighbors  (valency)  for  the  atoms  forming  the  bond.  The  bond  ( m.n )  is  assigned  the 
weight  l/(m  x  n)1'2,  and  the  contributions  of  different  bonds  are  added  together  to 
yield  the  molecular  connectivity  index. 

In  Table  I  we  show  a  preselected  order  of  alkane  isomers  having  4-6  carbon 
atoms,  based  on  the  condition  that  the  numerical  values  for  bond  contributions  repro¬ 
duce  the  ordering  for  a  selected  property.  The  order  selected  in  the  order  of  relative 
magnitudes  for  a  number  of  thermodynamic  molecular  properties,  including  the  boil¬ 
ing  points.  Hence,  it  should  not  be  surprising  that  the  connectivity  index  gives  very 
good  correlations — it  has  been  constructed  to  do  just  this  by  requiring  that  the  order¬ 
ing  be  reproduced.  If  the  ordering  is  reproduced,  as  Heilbronner  and  Schmelzer  have 
pointed  out  [20],  a  relatively  high  correlation  is  possible,  even  with  arbitrarily  se¬ 
lected  parameters!  The  particular  weights  1  /(m  x  n)1'2  represent  one  solution  to  the 
inequalities,  and  therefore  we  here  extend  their  use  to  weighting  paths.  The  modified 
program  ALLPATH,  which  from  the  input  list  of  atomic  neighbors  gives  the  count  of 
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Table  I.  Decomposition  of  bonds  in  smaller  alkanes  into  various  (m,n)  bond  types  and  associated  in¬ 
equalities  imposed  by  requiring  that  the  ordering  of  the  isomers  follow  the  ordering  of  selected  thermody¬ 
namic  properties  (the  weighting  rule  I/(m  x  n)'2  is  a  solution  to  the  inequalities). 


»  X 

2(1, 2)  +  (2, 2 )  >  3  0,3) 

>  X 

2(1,  2)  +  2(2,2)  >  (l,2)  +  2(l,3)  +  (2,3)  >40,4) 

>  Jy  >  X> 

2(1,2)  + 3(2,2)  >  20,2)  +  (1,3) +  2 (2, 3)  >  (l,2)+2(l,3)+(2,2)  +  (2,3)  > 
4(1,3)  + (3, 3)  >  (1,  2)  +  3(l,4)+(2,  4) 


weighted  paths,  is  available  [10]  and  suitable  for  applications  in  SAR.  To  each  bond 
in  a  graph  the  weight  \/(m  X  n)1'2  is  assigned,  and  whenever  a  path  contains  that 
bond  its  value  is  multiplied  by  this  factor.  Because  all  such  factors  are  less  than  one 
and  they  are  used  repeatedly,  the  scheme  drastically  reduces  the  values  for  path  num¬ 
bers  of  longer  paths,  giving  more  prominence  to  shorter  paths.  This  appears  to  be  a 
more  natural  way  to  diminish  the  dominant  role  of  paths  of  intermediate  length 
(which  are  the  most  numerous  and  tend  to  obscure  the  role  of  shorter  paths)  than 
mere  truncation. 

An  Illustration 

In  Table  II  we  give  the  computed  output  of  weighted  path  counts  for  the  molecule 
methyl-2-oxypropylnitrosoamine  (MOP),  one  of  the  nitrosoamines  that  we  examine 
in  the  next  section: 

ch3 
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Tabi.e  II.  Results  of  the  program  ALLPATH  modified  to  include  the  weighting  of 

bonds  as  l/(m  x  n)‘ 


V— 

7  6  5/3  21 

• — n — * 


8 


Atom 

P» 

P, 

P, 

1 

1 

0.8164 

0.2721 

1 

1.1498 

0.3285 

3 

1 

1.3189 

0.4165 

4 

I 

0.5773 

0.4281 

5 

1 

0  7618 

0,7985 

6 

1 

1.5606 

0.1443 

7 

1 

0  5000 

0.5303 

8 

1 

0.7071 

0.6035 

Molecule: 

8 

3.6960 

1.7610 

Atom 

Molecular 

Count 

Connectivity 

P> 

P4 

Ps 

Atomic 

ID 

0.2682 

0.0392 

0.0474 

2.4436 

0.0481 

0.1742 

00580 

2.5845 

2.9096 

0.2404 
0.1  III 

0.1005 

2.3465 

2  6714 

0.1314 

0.03928 

2.8757 

0.0721 

0.0657 

0.0196 

2.1878 

0.1020 

0.0929 

0.0277 

2.5334 

0.5739 

0.1979 

00474 

14  2764 

Higher  Connectivities'’ 

Molecular 

ID 

1  Each  row  gives  path  numbers  for  the  indicated  atom.  The  sum  of  the  entries  in  each  row  gives  the 
atomic  identification  (ID)  number.  Summation  along  individual  columns  (and  dividing  the  result  by  two. 
except  for  the  first  column)  gives  molecular  path  numbers.  The  first  two  members  in  the  molecular  path 
sequence  give  the  number  of  atoms  and  the  connectivity  index .  respectively. 
b  Slightly  differently  defined  that  in  the  original  reference  [8], 


Among  the  molecules  considered,  this  is  the  most  potent  mutagen  [21],  Our  interest 
will  center  primarily  on  atomic  ID  numbers,  which  represent  sums  of  all  weighted 
paths  to  individual  atoms  (when  nonweighted  paths  are  used  an  analogous  index  has 
been  considered  by  Balaban  [22]  and  Seybold  [23,24]).  In  Table  11  the  atomic  ID 
values  are  obtained  by  adding  the  entries  in  each  row  (with  each  row  corresponding 
to  one  atom).  If  we  add  numbers  in  each  column  we  obtain  the  sequence  of  numbers 
shown  at  the  bottom  of  Table  II.  Observe  that  the  first  entry  (i.e. ,  the  sum  of  the  first 
column)  is  the  value  of  the  atom  count  and  the  second  is  the  connectivity  index  for 
the  molecule.  Finally,  if  one  adds  all  the  numbers  in  Table  II  (all  rows  and  all 
columns),  one  obtains  the  molecular  ID  number,  indicated  also  in  the  table  (as  the 
total  number  of  paths). 

The  above  molecule  can  now  be  represented  in  one  of  the  following  ways: 

Molecular  ID:  14.2764 
Connectivity:  3.6960 

Weighted  paths:  8.  3.6960,  1.7610.  0.5739,  0.1979,  0.0474 

Set  of  atomic  IDs:  2.4436;  2.5845;  2.9096;  2.3465;  2.6714;  2.8757;  2.1878; 
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2.5334;  (Elements  of  set,  of  course,  can  be  displayed  in  any 
arbitrary  order.) 

Each  of  these  descriptors,  as  well  as  corresponding  unweighted  quantities,  have  been 
used  in  SAR  applications.  ID  numbers  have  been  used  to  define  nonempirical  cluster¬ 
ings  of  selected  therapeutically  useful  drugs  [25],  weighted  paths  are  used  in  simi¬ 
larity  studies  [26,27],  and  atomic  ID  numbers  in  a  search  for  antitumor  agents  [28], 
Here  we  pursue  a  similar  theme:  our  search,  however,  is  not  for  an  optimal  com¬ 
pound,  but  for  a  substructure  responsible  for  the  bioactivity  in  a  family  of  structures. 
As  will  be  seen,  atomic  ID  numbers  are  well  suited  for  such  an  application  because 
they  allow  some  flexibility  in  probing  various  molecular  fragments. 

Searching  for  Active  Substructures 

If  the  compounds  considered  are  structurally  similar  and  differ  only  modestly  in 
composition  and  arrangement  of  their  various  parts,  a  graph-theoretical  approach 
based  on  path  enumeration  may  be  appropriate  for  comparative  study  and  analysis.  If 
the  molecules  are  of  different  sizes,  even  when  the  same  pharmacophor  is  responsible 
for  their  activity,  any  similarity  in  the  path  counts  can  be  obscured  by  the  role  of  ex¬ 
traneous  atoms.  Clearly  one  then  has  to  restrict  the  count  of  paths  to  common  atoms 
in  all  the  members  of  the  family.  But  even  such  empirical  or  visual  selection  of  com¬ 
mon  atoms  may  still  contain  irrelevant  atoms  whose  paths  reduce  the  signal-to-noise 
ratio.  Often  a  trial-and-error  approach  needs  to  be  adopted.  Here  we  will  outline  the 
use  of  weighted  paths  in  a  systematic  search  for  a  molecular  fragment  (common 
within  a  family  of  compounds  considered)  that  can  be  identified  as  the  underlying, 
active  portion  of  the  molecules  under  consideration. 

We  have  selected  a  set  of  nitrosamine  mutagens  (mostly  propyl  derivatives)  be¬ 
cause  they  have  been  previously  examined  and  so  comparison  is  possible  with  alter¬ 
native  approaches.  Moreover,  these  structures  are  sufficiently  general  to  illustrate  the 
method.  The  compounds  are  shown  in  Figure  1 ,  abbreviated  as  in  Ref.  21 ,  where  in¬ 
terested  readers  can  find  their  full  chemical  names  and  other  details.  The  structures 
have  been  ordered  in  Figure  1  according  to  their  relative  mutagenic  potencies:  MOP 
is  the  most  potent  with  a  relative  mutagenic  activity  of  650.  (The  data  refer  to  muta¬ 
genic  activities  in  the  hamster  hepatocyte-mediated  V79  cell  mutagenesis  system  at  a 
concentration  of  0.7  mM,  and  are  expressed  as  numbers  of  ouabain-resistant  mutants 
per  104  surviving  V79  cells  [21].)  Table  III  lists  atomic  ID  numbers  for  seven  atoms, 
all  numbered  to  overlap  the  numbering  shown  for  MOP  in  Table  II.  When  extending 
the  path  counts  to  atoms  6  and  7  there  are  some  minor  problems  due  to  the  ambiguity 
of  alternative  labelings.  We  have  shown  both  alternatives,  and.  as  discussed  later, 
when  a  choice  exists  we  have  selected  the  one  that  optimizes  similarity.  DMN,  of 
course,  has  only  5  atoms,  and  is  excluded  when  considerations  extend  to  fragments 
with  6  or  more  atoms. 

One  can  see  by  inspection  of  Table  III  that  values  for  the  atomic  IDs  do  not  vary 
dramatically  within  each  column,  except  when  the  atom  in  question  has  (or  has  not. 
while  others  have)  another  neighbor  (substituent).  Thus,  all  the  values  in  the  first 
column  are  roughly  2.45-2.50,  the  next  column  has  somewhat  larger  values.  2.55- 
2.65,  the  next  values  lying  between  2.90  and  3.35,  and  so  on.  Yet  there  are  varia¬ 
tions,  and  the  nature  of  these  path  numbers  suggests  a  small  but  significant  role  for 
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Figure  I.  The  simplified  molecular  diagrams  for  the  13  nitrosamincs  considered.  Their 
mutagenic  activities  are  shown  by  the  numerical  values:  the  abbreviations  are  those  used  for 

Ref.  21. 

more  distant  neighbors.  We  now  wish  to  use  these  minor  variations  as  a  basis  for 
tests  of  relative  similarity  among  different  compounds. 

Rather  than  examining  all  similarities,  we  shall  restrict  our  attention  here  to  the  rela¬ 
tive  similarities  of  the  compounds  to  MOP  and  MHP,  the  two  compounds  with  the 
highest  mutagenic  activities  (650  and  380,  respectively).  Table  IV  shows  numerical 
estimates  of  the  relative  similarities  of  the  compounds  to  these  two  standards.  The  en¬ 
tries  in  Table  IV  are  obtained  by  viewing  the  collection  of  atomic  ID  numbers  for 
each  structure  as  a  vector,  and  taking  a  Euclidean  n-dimensional  metric  to  find  the 
distance  between  such  vectors.  Consider  the  “distance”  (i.e. ,  measure  of  similarity  or 
dissimilarity)  between  MOP  and  MHP  (restricting  attention  here  to  the  first  5  atoms): 


MOP  2.443  2.584  2.909  2.346  2.671 
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Table  III.  Atomic  ID  numbers  for  the  13  nitrosamines  considered.* 


Atom 

Molecule 

1 

2 

3 

4 

5 

6 

7 

MOP 

2.443 

2.584 

2.909 

2  346 

2.671 

2.875 

2.187 

MHP 

2.454 

2.598 

2.950 

2.369 

2.770 

2  926 

2.356 

DMN 

2.402 

2.534 

2  760 

2.260 

2.260 

— 

— 

BOP 

2.484 

2.634 

3.054 

2.732 

2.732 

2  897 

2.198 

2-MOB 

2.447 

2.584 

2.924 

2.355 

2  708 

2  979 

2.547 

MP 

2.451 

2.594 

2.939 

2.363 

2  744 

2  652 

2.375 

HPOP 

2.495 

2.647 

3.099 

2  749 

2  831 

2.903 

2.951 

-  2  201 
-2.370 

POP 

2.492 

2.644 

3.089 

2.744 

2.805 

2.683 

2.397 

2.200 

3-MOB 

2.455 

2.594 

2.594 

2.372 

2  781 

2.725 

2.894 

2-HPP 

2.503 

2.657 

3.129 

2.844 

2  822 

2.691 

2.403 

2.373 

DP 

2.500 

2.654 

3.118 

2.817 

2.817 

2.684 

2.401 

BHP 

2.506 

2.661 

3.140 

2.848 

2.848 

2.958 

2.374 

3-HPP 

2.508 

2.664 

3.148 

2.830 

2  891 

2.835 

2  698 
2.405 

*  Only  data  for  the  7  common  nonhydrogen  atoms  are  represented.  In  some  cases 
there  are  two  alternative  choices  for  the  7  atoms  and  both  alternatives  are  shown 


MHP  2.454  2.598  2.950  2  369  2.770 

One  takes  the  difference  between  each  pair  of  values  and  squares  it  thus: 

0.01 12  +  0.0142  +  0.0412  +  0.0232  +  0.992 . 

Upon  adding  these  contributions  one  obtains  the  result  0.0123,  the  square  root  of 
which  is  0.1110,  the  number  used  for  comparison  of  the  structures.  As  may  be  seen 
from  Table  IV  (which  gives  such  results  for  atoms  1-5,  for  atoms  1-6,  and  finally  for 
atoms  1-7)  all  the  numbers  are  relatively  small  and  differ  at  most  by  an  order  of  mag¬ 
nitude.  Nevertheless,  differences  do  exist,  and,  as  will  be  seen,  these  reflect  differ¬ 
ences  in  the  structures  considered. 

In  order  to  make  useful  deductions  from  Table  IV,  we  now  examine  the  cases  of 
5-,  6-,  and  7-atom  fragments  by  ordering  the  compounds  relative  to  the  selected  stan¬ 
dards.  This  is,  admittedly,  a  step  in  the  direction  from  quantitative  to  qualitative 
analysis.  But  later  on  we  shall  revisit  the  data  and  examine  them  quantitatively.  Or¬ 
dering  will  give  us  some  insights  into  the  complexities  and  difficulties  of  correlating 
data  and  structures.  In  Figures  2-4  we  show  for  the  three  cases  (fragments  having  5. 
6,  and  7  atoms)  an  ordering  of  the  structures  with  respect  to  MOP  (top  row)  and 
MHP  (bottom  row).  After  arriving  at  these  orderings  (as  previously  outlined  in  sev¬ 
eral  papers  on  path  comparison  of  structures  in  SAR  [12]).  one  uses  a  line  to  connect 
the  same  structures.  From  such  a  diagram  it  is  not  difficult  to  extract  all  the  partial 
orders ,  and  these  are  shown  by  diagrams  in  which  we  have  inserted  the  values  of  the 
mutagenic  activities,  rather  than  the  molecular  symbols  (except  for  the  case  of  the 
5-atom  fragments,  where  we  have  for  the  sake  of  clarity  shown  both).  If  all  the  num- 
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MOP  2-MOB  MP  MhP  3-MOB  BOP  OMN  POP  HPOP  BHP  DP  2-HPP  3-HPP 


MHP  3-MOB  MP  2-MOB  MOP  BOP  POP  HPOP  DP  2-HPP  3-HPP  BHP  DMN 


Figure  2.  Ordering  of  the  nitrosamines  relative  to  MOP  and  MHP  (top)  and  the  derived 
partial  order  (constructed  by  first  pairing  same  labels  in  the  top  two  rows  and  then  examin¬ 
ing  relations  for  which  no  crossing  of  lines  occur).  The  bottom  diagram  replaces  molecular 
species  by  their  mutagenic  activity  values.  Ordering  is  based  on  common  5-atom  fragments. 


MOP  2-MOB  MHP  MP  3-MOB  BOP  HPOP  POP  BHP  DP  3-HPP  2-HPP 

>K  X  I  I  I  X  I 

MHP  2-MOB  MOP  3-MOB  MP  BOP  HPOP  POP  3-HPP  DP  BHP  2-HPP 


Figure  3 .  Ordering  of  nitrosamines  based  on  6-atom  fragments  and  the  corresponding  par¬ 
tial  order  with  mutagenic  activity  values  displayed. 

bers  in  a  partially  ordered  diagram  were  to  follow  monotonically  from  left  to  right, 
we  could  claim  that  the  approach  points  to  a  specific  fragment  as  the  substructure  re¬ 
sponsible  for  bioactivity.  A  look  at  Figure  2  is,  however,  disappointing:  BOP  (250) 
and  DMN  (320)  do  not  fit  the  scheme  at  all,  and  in  addition,  the  small  value  of  30 
(for  3-MOB)  appears  “too  early”  in  the  diagram.  Observe,  however,  that  when  we 
add  on  additional  (sixth)  atom  (Fig.  3),  a  considerably  better  diagram  results.  Of 
course,  we  had  to  eliminate  DMN  (which  has  only  5  nonhydrogen  atoms),  but  al¬ 
though  the  diagram  is  improved,  there  are  still  several  discrepancies.  Now  the  values 
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M0^^Mf-IP^2^M0B  BOP  HPOP  POP  BHP  MP  OP  2-HPP  3-HPP  3-MOB 
MHP  2-MOB  MOP  MP  BOP  HPOP  POP  3-HPP  DP  BHP  2-HPP  3-MOB 


Figure  4.  Ordering  of  nilrosamines  based  on  7-aiom  fragments  and  the  corresponding  par¬ 
tial  order  with  mutagenic-  activity  values  displayed. 


105  and  30  are  “pushed”  a  bit  to  the  right,  and  those  of  HPOP  and  POP  (90  and  75. 
respectively)  are  ordered  correctly.  All  this  suggests  that  atom  6  is  relevant,  though 
we  still  cannot  claim  a  clear  cut  result. 

Extending  considerations  to  a  fragment  having  seven  atoms,  we  obtain  the  diagram 
shown  in  Figure  4.  It  shows  dramatic  improvement:  observe  that  3-MOB  with  its  low 
value  (30)  has  been  pushed  to  the  end  of  the  diagram  and  the  relative  orders  of 
BOP  (250)  and  MP  (105).  which  were  previously  incorrect,  have  now  been  resolved 
by  placing  them  in  different  branches  of  the  lattice.  Although  the  diagram  of  Figure  4 
is  still  not  “perfect”  [e.g.,  there  remains  an  inversion  of  BOP  (250)  and  2-MOB  (210)]. 
if  we  view  these  discrepancies  as  minor  (recalling  the  considerable  experimental  un¬ 
certainty)  we  can  see  definite  progress  and  claim  that  the  approach  now  appears  to 
have  captured  the  main  features  of  the  pertinent  molecular  fragment  responsible  for 
biological  (mutagenic)  activity. 

In  order  to  see  even  better,  and  on  a  quantitative  scale,  the  substance  of  the  present 
approach,  we  illustrate  in  Figures  5-7  the  relationships  between  the  mutagenic  activi¬ 
ties  (y  axis)  and  similarities  to  MOP  (the  most  potent  mutagenic  structure,  x  axis). 
For  an  ideal  theoretical  model  one  should  obtain  a  correlation  curve  which  decreases 
from  the  top  left  of  the  figure  to  bottom  right.  Instead,  in  Figure  5  we  see  a  widely 
scattered  set  of  points  in  the  middle  of  the  figure,  indicating  a  failure  of  the  assump¬ 
tion  that  a  5-atom  fragment  can  explain  the  mutagenic  activity.  In  Figure  6,  when  a 
sixth  atom  is  taken  into  account,  we  notice  some  narrowing  of  the  middle  section, 
with  the  same  three  structures  (BOP,  2-MOB,  and  MP)  as  outliers.  Figure  6  shows 
improvement  over  Figure  5,  but  remains  unsatisfactory.  Finally,  in  Figure  7,  we  see 
an  essentially  valid  correlation,  with  perhaps  only  3-MOB  at  the  end  of  the  plot  being 
somewhat  out  of  line.  This  latter  observation  does  not  represent  serious  misbehavior, 
in  part  because  we  are  primarily  interested  in  (i)  similarity  to  the  most  potent  com¬ 
pound  MOB;  and  (ii)  the  question  whether  activity  can  be  traced  to  a  fragment 
present  in  the  most  potent  compounds.  If  a  compound  shows  little  similarity  to  the 
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0.1  0.2  0.3  0.4  0.5  0.6 

Similarity  (-*— ) 


Figure  5.  Relationships  between  mutagenicities  and  similarities  to  MOP.  based  on  5-atom 

fragments. 

standard  (as  is  the  case  with  3-MOB.  which  is  the  structure  least  similar  to  MOP) 
then  other  factors  may  influence  its  activity  and  cause  observed  deviations. 

Conclusions 

Figures  4  and  7  suggest  that  for  the  dozen  nitrosamincs  considered,  one  can  at  least 
partly  associate  mutagenic  potency  with  the  presence  of  a  particular  7-atom  substruc¬ 
ture.  The  limited  success  of  other  approaches,  for  example,  use  of  the  connectivity 
index  (which  gives  a  fair  correlation),  may  well  lie  in  the  fact  that  the  nitrosamines 
have  superfluous  atoms  which  may  reduce  the  signal-to-noise  ratio.  We  have  de¬ 
scribed  how  a  search  for  an  active  fragment  can  be  carried  out.  Our  outline  employs 
nitrosamines  as  an  illustration,  but  the  methodology  can  readily  be  applied  to  other 
cases,  and  in  particular  to  situations  in  which  the  bioactive  fragment  is  not  known. 
Such  cases  may  require  a  trial-and-error  approach,  but  with  the  accumulation  of  rele- 
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Similarity  (— ) 

Figure  7.  Correlation  between  the  mutagenicities  of  the  nitrosamines  and  their  similarities 
to  MOP,  based  on  the  7-atom  fragments.  Observe  the  improvement  in  the  correlation  rela¬ 
tive  to  that  in  Figures  5  and  6. 
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vant  information  k  should  be  possible  to  restrict  the  number  of  candidate  fragments 
and  determine  which  of  these  is  most  likely  to  be  the  underlying  bioactive  component. 
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Abstract 

Briefly  noting  earlier  studies  on  the  polypentapeptide  of  elastin,  (Val1-Pro2-Gly3-Val4-G!y5),,  and  on 
elastin,  it  is  emphasized  that  entropic  elastomeric  force  can  be  exhibited  by  nonrandom,  anisotropic 
polypeptide  systems  and  therefore  that  entropic  elastomeric  force  does  not  necessarily  result  from  isotropic 
random  chain  networks  as  required  by  the  classical  theory  of  rubber  elasticity,  nor  does  it  result  from  sol¬ 
vent  entropy  effects  as  deduced  from  the  slow  loss  of  elastomeric  force  on  thermal  denaturation.  Instead 
entropic  protein  elasticity  can  be  the  result  of  internal  chain  dynamics,  specifically  of  librational  processes 
that  become  damped  on  chain  extension.  This  new  mechanism  of  entropic  protein  elasticity  allows  for 
an  understanding  not  only  of  elastin  but  also  of  the  passive  tension  of  striated  muscle,  of  the  voltage- 
dependent  interconversion  between  open  and  closed  conductance  states  in  the  sodium  channel  of  squid 
nerve,  and  of  protein  elastic  forces  producing  strain  in  a  substrate  bond  during  enzyme  catalysis.  Because 
entropic  elastomeric  force  develops  as  a  result  of  an  inverse  temperature  transition,  it  becomes  possible  to 
shift  the  temperature  of  the  transition  to  higher  or  lower  temperatures  by  decreasing  or  increasing,  respec¬ 
tively,  the  hydrophobicity  of  the  elastomeric  polypeptide  chain.  In  warm-blooded  animals  this  allows  for 
biochemical  modulation  of  the  relaxation  or  development  of  entropic  elastomeric  force  by  an  enzymatically 
modulated  decrease  or  increase  of  the  hydrophobicity,  as  for  example,  by  phosphorylation  or  dephosphory¬ 
lation  of  the  elastomeric  polypeptide  chain.  This  understanding  provides  a  mechanism  for  modulating 
protein  function,  whether  for  example  enzymatic  or  channel,  a  mechanism  for  the  remarkable  reversible 
structural  processes  that  attend  parturition,  and  a  mechanism  for  the  connective  tissue  anomalies  of  wound 
repair  and  environmentally  induced  lung  disease. 

Introduction 

Presently  recognized  as  the  primary  elastomeric  protein  of  warm-blooded  animals, 
elastin  is  the  second  most  prevalent  protein  in  the  extracellular  matrix,  only  collagen 
is  more  common  [1],  The  nature  of  the  elastomeric  force  was  demonstrated  by  Hoeve 
and  Flory  in  1958  to  be  dominantly  entropic  in  origin  [2],  This  is  an  important  state¬ 
ment  as  it  provides  an  understanding  of  the  durability  of  elastin  where  single  elastin 
fibers  can  last  the  lifetime  of  an  individual,  which  when  used  in  the  human  vascular 
system  means  undergoing  more  than  one  billion  stress/strain  cycles.  That  elastin  is  a 
dominantly  entropic  elastomer  was  reaffirmed  by  Hoeve  and  Flory  in  1974  when  they 
continued  to  insist  that  “A  network  of  random  chains  within  elastin  fibers,  like  that  in 
a  typical  rubber,  is  clearly  indicated”  [3],  This  perspective  has  dominated  thinking 
with  respect  to  protein  elasticity  for  nearly  three  decades  and  remains  a  staunchly 
held  perspective  14-11].  Accordingly,  the  insistence  that  entropic  elastomeric  force 
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requires  a  random  network  of  chains  has  precluded  application  to  protein  systems 
known  to  be  nonrandom  chain  networks. 

Studying  the  molecular  structure  and  function  of  the  polypentapeptide  of  elastin, 
this  laboratory  has  demonstrated  a  new  mechanism  of  entropic  elasticity  for  this  most 
striking  primary  structural  feature  of  elastin  [12],  occurring  within  the  longest  se¬ 
quence  between  cross-links,  a  sequence  twice  as  long  as  any  other  possible  elas¬ 
tomeric  sequence  between  cross-links  [12, 13],  and  has  demonstrated  its  applicability 
to  the  elastin  fiber  as  a  whole  [14, 15].  The  mechanism  derives  from  internal  chain 
dynamics  and  is  called  the  librational  entropy  mechanism  of  elasticity.  In  this  report 
the  new  mechanism  of  entropic  elasticity  is  considered  relative  to  other  protein  sys¬ 
tems  where  elastomeric  force  is  implicated  but  where  the  proteins  cannot  be  de¬ 
scribed  as  random  chain  networks. 

In  particular,  the  identification  and  possible  origins  of  entropic  elastomeric  force 
are  considered  briefly.  The  applicability  of  internal  chain  dynamics,  that  is,  libra¬ 
tional  processes,  to  protein  elasticity  as  newly  understood  in  elastin  is  extended  to  an 
understanding  of  the  passive  tension  in  muscle,  of  changing  conductance  states  of 
channels,  and  of  enzyme  mechanisms.  Furthermore,  the  relevance  of  structural  tran¬ 
sitions  to  and  from  the  elastomeric  state  is  considered  in  regard  to  elastogenesis, 
wound  repair,  fibrotic  lung  disease,  and  to  processes  attending  parturition  and  their 
reversal,  that  is,  cervical  ripening  and  pubic  ligament  formation. 


Possible  Origins  of  Entropic  Elastomeric  Force  in  Proteins 

Elasticity,  of  course,  is  the  property  whereby  a  material  resists  arid  recovers  from 
deformation.  The  elastomeric  force,  /,  can  be  considered  to  be  comprised  of  two 
components:  an  internal  energy  component,/,,  and  an  entropy  component,/,,  or 

/=/,+/,  (1) 
Following  Flory  and  colleagues,  the  relative  magnitudes  of  the  internal  energy  and 
entropy  components  can  be  determined  by  means  of  thermoelasticity  studies  [16].  In 
these  studies  the  elastomer  is  extended  to  a  fixed  length  and  the  elastomeric  force  is 
measured  as  a  function  of  temperature.  A  plot  of  ln[//7'(°K)]  versus  temperature  al¬ 
lows  evaluation  of  the  /,// ratio,  or 


L  =  t  d  ln(//7)  - 

/  dT  r'L  n'  c?(VjV)  -  1 


(5) 


where  the  experiment  is  carried  out  at  constant  pressure,  P,  with  the  elastomer  at 
fixed  length,  L,  and  with  the  elastomeric  matrix  in  equilibrium,  eq,  with  the  solvent. 
The  second  term  in  Eq.  (2)  is  a  correction  term  allowing  the  analysis  to  proceed  at 
constant  pressure  rather  than  constant  volume,  and  in  equilibrium  with  solvent  rather 
than  at  constant  composition  [17],  In  this  term  /3,,  =  (3  In  V/d  T)P  L  rq  is  the  thermal 
expansion  coefficient;  a  is  the  fractional  increase  in  length;  and  V,  and  V  are  the  elas¬ 
tomer  volumes  before  and  after  elongation.  This  correction  term  is  of  the  order  of 
0.1  for  elastin  [18]  as  well  as  for  the  polypentapeptide  of  elastin  [19].  Figure  1  shows 
thermoelasticity  studies  for  elastin  and  for  the  polypentapeptide  of  elastin,  where  par¬ 
ticularly  for  the  latter,  the  near  zero  slope  argues  for  a  dominantly  entropic  elas- 


ENTROPIC  ELASTOMERIC  FORCE  LN  PROTEIN  STRUCTURE 


263 


100  I 
o 
o 

rO 


80 

60 

40 

20 


>. 

•»- 

T3 

3 

5 


T> 

<D 


Figure  1 .  Thermoelasticity  studies:  Temperature  dependence  of  elastomeric  force  at  fixed 
extension.  (A)  Polypentapeptide  of  elastin  cross-linked  by  20  Mrads  of  y-irradiation  while 
in  the  coacervate  state  which  is  obtained  by  raising  the  temperature  of  solutions  of  poly¬ 
pentapeptide  plus  water  from  20°C  to  40°C  to  form  a  dense  viscoelastic  phase  that  is  62% 
water,  38%  peptide  by  weight.  The  sample  is  extended  to  60%  at  40°C  and  then  the  force  is 
measured  as  a  function  of  temperature.  In  going  from  20  to  40°C  there  is  an  abrupt  develop¬ 
ment  of  elastomeric  force,  but  above  40°C  the  plot  of  ln|foree/T(K)]  versus  temperature 
exhibits  agiear  zero  slope.  Since  the  slope  is  proportional  to  the  fjf  ratio  and  since  this  is 
near  zero,  it  can  be  argued  that  the  polypentapeptide  of  elastin  exhibits  dominantly  entropic 
elastomeric  force  in  the  temperature  range  above  40°C.  The  development  of  elastomeric 
force  in  the  20-40°C  range  correlates  with  an  inverse  temperature  observed  by  numerous 
physical  methods  and  seen  to  be  a  process  of  self-assembly  into  fibers.  Therefore  the 
polypentapeptide  of  elastin  is  an  ansiotropic,  entropic  elastomer.  (B)  Ligamentum  nuchae 
elastin  exhibits  a  similar  development  of  elastomeric  force  on  raising  the  temperature  over  a 
somewhat  broader  temperature  range,  but  at  higher  temperatures  the  slope  approaches  zero 
and  a  dominantly  entropic  elastomer  has  been  concluded.  This  conclusion  is  assisted  by  car¬ 
rying  out  the  study  in  30%  ethylene  glycol  in  water  which  shifts  the  transition  to  lower  tem¬ 
perature  giving  a  wider  temperature  range  where  the  slope  is  near  zero.  In  both  cases  there 
is  plotted  on  the  right-hand  side  the  temperature  profile  for  aggregation,  actually  for  fiber 
formation  as  observed  by  microscopy,  for  the  constituent  peptide  on  protein.  (9 —  9) 
Right  ordinate;  —  A)  'eft  ordinate.  Reproduced  with  permission  from  Ref.  20. 


tomeric  force  [20],  On  changing  the  solvent  to  ethylene  glycol- water,  3:7  by  volume, 
the  rapid  rise  in  elastomeric  force  is  shifted  to  lower  temperature  and  the  near  zero 
slope  becomes  more  apparent  for  elastin  [unpublished  data,  2,3],  Furthermore,  a 
near  zero  slope  for  elastin  has  been  found  in  triethylene  glycol  [10].  Thus  elastin  and 
the  polypentapeptide  of  elastin  are  considered  to  be  dominantly  entropic  elastomers. 
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The  Classical  Theory  of  Rubber  Elasticity  for  Random  Chain  Networks 

The  classical  or  statistical  theory  of  rubber  elasticity  holds  that  entropic  elastomeric 
force  derives  from  random  chain  networks  [21-23],  At  rest  the  network  is  described 
as  being  comprised  of  a  random  distribution  of  chain  end-to-end  lengths.  This  is  the 
highest  entropy  state.  On  stretching,  the  distribution  of  end-to-end  lengths  is  dis¬ 
placed  from  that  of  highest  entropy.  This  decrease  in  entropy  provides  the  resistance 
to  deformation  and  the  driving  force  for  recovery.  A  representative  distribution  of 
chain  end-to-end  lengths  is  given  in  Figure  2  where  W(r)  is  the  probability  distribu¬ 
tion  of  the  end-to-end  lengths,  r,  in  nm.  In  this  theory  the  fjf  ratio  is  given  by 
d  In (r2)o/(fT  where  (r2)o  is  the  mean  square  end-to-end  chain  length. 

Solvent  Entropy 

When  the  elastomer  is  comprised  of  hydrophobic  groups  that  become  exposed  to 
polar  solvents  such  as  water  on  extension,  several  workers  [24-27]  have  suggested 
that  the  formation  of  clathrate-like  water  surrounding  these  exposed  hydrophobic 
groups  constitutes  a  decrease  in  entropy  that  would  provide  an  entropic  restoring 
force. 

internal  Chain  Dynamics:  Librational  Process 

Another  source  of  decrease  in  entropy  on  extension  has  been  derived  from  studies 
on  the  polypentapeptide  of  elastin  [14, 15, 28-31]  but  it  is  an  entirely  general  mecha¬ 
nism.  It  asserts  that  chain  segments  within  a  bulk  matrix  have  freedom  to  undergo 
rocking  motions.  Since  the  chain  segments  in  the  dense,  cross-linked  bulk  matrix  will 
be  essentially  immobilized  at  their  ends,  motion  occurs  by  rotation  about  one  bond 
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Figure  2.  Probability  distribution,  W(r),  of  chain  end-to-end  lengths  r  in  nm.  The  solid 
line  gives  the  distribution  for  a  freely  jointed  chain  with  10,000  segments  of  0.2S  nm  each 
[22],  This  is  a  random  distribution  of  end-to-end  lengths  representing  the  highest  entropy 
state.  On  stretching  of  a  bulk  cross-linked  matrix  of  such  a  collection  of  chains,  the  distribu¬ 
tion  is  displaced  from  that  of  a  random  chain  network.  The  decrease  in  entropy  provides  a 
resistance  to  deformation  and  a  restoring  force.  This  is  a  description  of  the  classical  theory 
of  rubber  elasticity.  The  dashed  curve  represents  a  possible  distribution  of  chain  end-to-end 
lengths  where  the  chains  are  nearly  the  same  length.  In  this  case  an  entropic  restoring  force 
can  derive  from  the  damping  of  internal  chain  dynamics  on  extension.  This  has  been  referred 
to  as  the  librational  entropy  mechanism  of  elasticity  which  as  represented,  can  occur  with 
anisotropic  fibrillar  elastomers. 


being  paired  with  compensating  rotations  about  one  or  more  other  bonds  such  that 
rocking  motions  or  librational  processes  occur.  On  stretching  these  librational  mo¬ 
tions  become  damped.  This  has  been  termed  the  librational  entropy  mechanism  of 
elasticity  [29], 


Elastomeric  Processes  in  Protein  Systems 

The  Polypentapeptide  of  Elastin 

As  shown  in  Figure  1(A),  when  the  polypentapeptide  of  elastin  is  y-irradiation  cross- 
linked  at  a  concentration  of  about  40%  peptide,  60%  water  by  weight,  the  resulting 
elastomer  exhibits  dominantly  entropic  elastomeric  force  above  40°C.  On  raising  the 
temperature  from  20°  to  40°C,  however,  there  is  a  dramatic  development  of  elas¬ 
tomeric  force.  This  development  has  been  demonstrated  by  five  independent  physical 
methods — nuclear  magnetic  resonance  structural  and  relaxation  studies,  dielectric  re¬ 
laxation  studies,  circular  dichroism  studies,  microscopic  characterization,  and  com¬ 
position  studies — to  correlate  with  development  of  molecular  order,  that  is,  to 
correlate  with  an  inverse  temperature  transition  114, 19, 32],  In  the  20-40°C  tempera¬ 
ture  range,  development  of  molecular  order  correlates  with  development  of  elas¬ 
tomeric  force.  That  the  entropically  elastomeric  state  above  40°C  is  an  ordered  state 
is  further  demonstrated  by  thermal  denaturation  followed  by  circular  dichroism  [19], 
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by  extrusion  of  water  [15,  19]  and  most  directly  by  the  slow  loss  of  elastomeric  force 
and  of  elastic  modulus  [15,33],  all  demonstrated  by  heating  at  80°C.  As  the  elas¬ 
tomeric  state  is  not  a  random  chain  network  and  since  at  80°C  destructuring  of 
clathrate-like  water  would  occur  with  time  constant  of  the  order  of  nanoseconds  or 
less  whereas  the  loss  of  elastic  modulus  at  80°C  occurs  with  a  half-life  of  days,  the 
entropic  elastomeric  force  must  be  due  to  internal  chain  dynamics. 

The  proposed  elastomeric  structure  of  the  polypentapeptide  of  elastin  is  given  in 
Figure  3  [31 , 34, 35, 28]  and  the  effect  of  stretching  on  the  damping  of  the  librational 


(a)  /3-turn  perspective 


(b) 


schematic  representations 


Figure  3.  Proposed  conformation  of  the  elastomeric  state  of  the  polypentapeptide  of 
elastin:  (A)  0-tum  perspective  showing  the  10-atom  hydrogen-bonded  ring  which  utilizes 
the  Val'C — 0  --HN  Val*  hydrogen  bond.  This  conformation  was  first  developed  in  solu¬ 
tion  using  nmr  methods  and  then  demonstrated  in  the  crystal  for  the  cyclopentadecapeptide 
which  was  shown  to  be  the  cyclic  conformational  correlate  of  the  polypentapeptide  of 
elastin.  Reproduced  with  permission  from  Ref.  34.  (B)  and  (C)  Schematic  representations 
of  the  helical  state  (0  spiral)  of  the  polypentapeptide  of  elastin  which  is  the  elastomeric 
state.  In  (C)  the  /3  turns  are  included  showing  them  to  function  as  spacers  with  hydrophobic 
contacts  between  the  turns  of  the  spiral.  Reproduced  with  permission  from  Ref.  31.  (D) 
Detailed  stereo  pair  of  the  spiral  axis  view  showing  space  for  water  within  the  0  spiral 
and  showing  suspended  segments  between  the  0  turn.  The  suspended  segment  runs  from  the 
a-carbon  of  Val4  to  the  a-carbon  of  Val1  and  is  referred  to  as  the  Val‘-Gly5-Val’  suspended 
segment.  It  is  within  the  segment  where  the  large  amplitude,  low  frequency  librational 
motions  are  most  pronounced  (see  Figs.  4  and  5).  Reproduced  with  permission  from 
Ref.  31.  (E)  Stereo  pair  of  the  side  view  of  the  0  spiral  of  the  polypentapeptide  of  elastin. 
This  is  one  of  a  family  of  closely  related  0  spirals.  Seen  here  are  gaps  in  the  surface  of  the 
0  spiral  on  each  side  of  the  suspended  segments.  The  contacts  between  turns  of  the  spiral 
utilize  the  Val  and  Pro  hydrophobic  side  chains.  The  structure  in  (E)  is  displayed  the  same 
as  in  the  schematic  representation  in  (C).  It  is  the  optimization  of  intramolecular  hydro- 
phobic  interactions  that  is  responsible  for  0-spiral  formation.  Reproduced  with  permission 
from  Ref.  35.  (F)  and  (G)  Supercoiling  of  0  spirals  to  form  twisted  filaments  of  dimensions 
similar  to  those  observed  in  transmission  electron  micrographs  of  negatively  stained  poly¬ 
pentapeptide,  a-elastin  and  tropoelastin  coacervates  (14,30,40]  and  of  negatively  stained 
elastin.  The  structure  is  given  in  (F)  in  a-carbon  to  a-carbon  virtual  bond  representation  and 
in  (G)  in  terms  of  spheres  of  different  sizes  centered  at  the  a-carbon  locations.  Reproduced 
with  permission  from  Ref.  28. 
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>3- spiral  of  the  pdypentapeptide  of  elastin 


twisted  filament  (super  coiled)  representations 


Figure  3.  ( Continued .) 

motions  is  shown  in  Figure  4  [29].  The  regularly  repeating  structure  of  the  polypen- 
tapeptide  provided  the  opportunity  to  demonstrate  unequivocally  that  entropic  elas¬ 
tomeric  force  occurs  on  formation  of  a  regular  nonrandom  structure.  One  of  the 
particularly  interesting  demonstrations  is  provided  by  dielectric  relaxation  studies 
[36],  At  20°C  where  there  is  minimal  elastomeric  force,  the  real  part  of  the  dielectric 
permittivity  in  the  1  GHz  to  1  MHz  frequency  range  exhibits  a  monotonically  in¬ 
creasing  curve.  This  is  shown  in  Figure  5.  As  the  temperature  is  raised  and  elas¬ 
tomeric  force  develops,  there  develops  a  localized  Debye-type  relaxation  centered 
near  20  MHz.  This  has  been  assigned  to  a  peptide  librational  mode  [14, 36].  The  in¬ 
tensity  at  40°C,  Ae  =  70,  and  the  localized  nature  of  the  relaxation  require  a  regular 
nonrandom  elastomeric  state  and  the  relaxation  identifies  a  backbone  (peptide)  libra¬ 
tional  mode  that  is  directly  contributing  to  the  high  entropy  of  the  relaxed  state. 
While  the  phenomenology  enumerated  above  require  setting  aside  the  random  chain 
network  analysis  and  the  elimination  of  solvent  entropy  as  a  consideration,  this  ex¬ 
periment  allows  direct  observation  of  the  responsible  internal  chain  dynamics.  This  is 
the  remarkable  contribution  of  the  polypentapeptide  of  elastin. 


i.  -v 
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(a) 


Figure  4.  Stereo  pair  view  of  a  pentadecapeptide  segment  in  the  /3-spiral  conformation  of 
Fig.  3(E)  in  which  the  central  Val1  a-carbon  to  Vai1  o -carbon  pentamer  has  been  allowed  to 
assume  conformations  within  a  2  kcal/mole  residue  cut-off  energy.  What  is  observed  is  a 
rocking  motion  of  peptide  moieties.  In  the  relaxed  state  in  (A),  large  librational  motions  are 
observed  whereas  in  an  extended  state,  in  (B)  at  130%  extension,  the  librational  amplitudes 
are  markedly  damped.  This  decrease  in  amplitude  of  the  librations  and  possibly  an  asso¬ 
ciated  increase  in  the  frequency  of  the  librational  motions  on  extension  is  the  decrease  in 
entropy  that  resists  elongation  and  that  provides  die  restoring  force.  This  is  called  the  libra¬ 
tional  entropy  mechanism  of  elasticity  and  this  mechanism  for  developing  entropic  elas¬ 
tomeric  force  can  occur  in  any  polypeptide  segment  wherein  the  structure  favors  librational 
processes.  Reproduced  with  permission  from  Ref.  29. 


The  Elastin  Fiber 

In  the  case  of  the  elastin  fiber,  3  of  the  5  physical  methods,  utilized  to  demonstrate 
that  increase  in  elastomeric  force  in  the  below  40°C  temperature  range  correlates  with 
increase  in  molecular  order  in  the  polypentapeptide,  have  been  applied  to  elastin.  to 
the  precursor  protein,  or  to  a  chemical  fragmentation  product  of  elastin.  Those  physi¬ 
cal  methods  are  microscopy  [37-40],  dielectric  relaxation  [41],  and  circular  dichro- 
ism  [42].  Furthermore,  thermal  denaturation  has  been  directly  observed  on  elastin,  as 
on  the  polypentapeptide  of  elastin,  by  following  the  slow  loss  of  elastomeric  force  in 
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Figure  5.  Dielectric  permittivity  (real  part)  of  the  polypentapeptide  of  elastin  coacervate 
which  is  38%  peptide  and  62%  water  by  weight.  On  raising  the  temperature  from  20  to 
40°C  there  develops  an  intense,  localized,  Debye-type  relaxation  near  20  MHz.  As  the  only 
dipolar  entities  are  water  and  peptide  moieties  and  because  the  intensity  of  the  relaxation  is 
so  large  and  the  frequency  relatively  low  with  a  low  temperature  dependence,  the  relaxation 
is  assigned  to  a  peptide  librational  motion.  Because  the  relaxation  is  at  a  localized  frequency 
the  polypentapeptides  must  be  developing  a  regular  structure  as  the  temperature  is  raised 
from  20  to  40°C.  The  development  of  this  relaxation  correlates  with  the  development  of 
elastomeric  force  observed  in  Figure  1(A).  The  relaxation  is  taken  to  be  due  to  the  libra¬ 
tional  motions  shown  in  Figure  4(A).  Reproduced  with  permission  from  Ref.  36. 


a  thermoelasticity  study  and  the  slow  loss  of  elastic  modulus  monitored  by  stress/ 
strain  curves  at  37°C  which  resulted  from  heating  at  80°C  [15, 33].  Therefore,  the  en- 
tropic  elastomeric  force  exhibited  by  this  protein  is  not  due  to  a  random  chain 
network  nor  is  it  due  to  the  formation  of  clathrate-like  water  structures,  rather  it  too 
must  derive  from  internal  chain  dynamics.  It  may  be  noted  that  the  slow  thermal  de- 
naturation  is  in  the  practical  sense  irreversible  in  water.  Here  again  the  internal  chain 
dynamics  can,  with  the  insight  of  the  studies  on  the  polypentapeptide  of  elastin  and 
with  awareness  that  the  most  prominent  sequence  between  cross-links  is  where  the 
polypentapeptide  resides,  be  directly  observed  by  dielectric  relaxation  studies  on 
a-elastin  (the  chemical  fragmentation  product  of  elastin)  in  the  1  GHz  to  1  MHz  fre¬ 
quency  range  as  shown  in  Figure  6  [41].  While  the  intensity  of  the  relaxation  is  less, 
as  expected  with  the  polypentapeptide  being  a  fractional  component  of  a-elastin,  a 
relaxation  is  again  observed  near  20  MHz. 


Elastomeric  Filaments  of  Muscle 

Studies  of  Maruyama  et  al.  [43,44]  and  of  Wang  et  al.  [45,46]  have  resulted  in  the 
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TEMPERATURE  DEPENDENCE 


DIELECTRIC  RELAXATION  SPECTRUM  OF 
ct-ELASTIN  COACERVATE 


Figure  6.  Dielectric  permittivity  (real  part)  of  the  coacervate  state  of  a-elastin  which  is  a 
70,000  molecular  weight  chemical  fragmentation  product  of  elastin.  Below  15°C  there  is 
a  monotonically  increasing  permittivity  from  several  hundred  MHz  to  1  MHz.  But  as  the 
temperature  is  raised  there  develops  a  relaxation  near  20  MHz.  A  a-elastin  contains  the 
polypentapeptide  of  elastin  which  exhibits  a  similar  relaxation  (see  inset  and  Fig.  5)  this 
relaxation  in  a-elastin  has  been  assigned  to  the  same  or  similar  peptide  librational  processes. 
The  development  of  the  relaxation  with  temperature  in  the  15°  to  45°C  temperature  range 
correlates  with  the  development  of  elastomeric  force  over  the  same  temperature  range  as 
seen  in  Figure  1(B).  Thus  this  along  with  considerable  other  data  on  elastin,  a-elastin  and 
tropoelastin  including  thermal  denaturation  of  elastomeric  force  and  elastic  modulus  of 
elastin  at  80°C  [33]  allows  the  conclusion  that  elastin  too  is  a  nonrandom  entropic  elas¬ 
tomer.  Reproduced  with  permission  from  Ref.  41. 


isolation  of  a  several  million  molecular  weight  elastic  protein  from  muscle.  Efforts  to 
characterize  this  protein  microscopically  have  demonstrated  the  protein  to  be  filamen¬ 
tous  [43],  This  protein  becomes  a  possible  explanation  for  the  passive  tension  of 
muscle  and  for  the  residual  passive  force  exhibited  when  the  sarcomere  length  has 
been  extended  beyond  the  point  where  the  thick  and  thin  filaments  no  longer  overlap. 
Microscopic  studies  on  pulled  fibers  have  led  to  the  identification  of  long  narrow 
filaments  either  connecting  the  thick  filaments  to  the  Z  lines  or  directly  running  from 
Z  line  to  Z  line  [47,48],  Consistent  with  an  effort  to  understand  elastomeric  force  in 
terms  of  random  networks,  it  has  been  suggested  that  the  stretching  itself  causes  the 
filaments  to  form  from  a  gel  state  (see  discussion  following  Ref.  47).  Consistent  with 
the  random  chain  network  theory  of  entropic  elasticity,  efforts  are  made  to  understand 
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elasticity  in  terms  of  an  isotropic  gel  state  rather  than  in  terms  of  the  anisotropic 
filaments  observed  microscopically  both  on  the  pulled  fibers  and  for  the  isolated 
eltstic  protein  of  muscle.  Having  demonstrated  internal  chain  dynamics  to  be  the 
source  of  entropic  elastomeric  force,  it  now  becomes  possible  to  understand  durable 
nonrandom,  anisotropic  elastomeric  filaments,  and  thereby  to  accept  the  microscopic 
observations  of  the  isolated  elastic  protein  of  muscle  and  of  the  pulled  muscle  fibers. 

Interconversion  of  Sodium  Channel  Conductance  States 

To  a  physical  chemist,  one  of  the  very  challenging  aspects  of  biology  is  under¬ 
standing  the  molecular  structure  and  mechanisms  of  ion-selective,  voltage-dependent 
transmembrane  channels.  The  conductance  state — open,  closed,  refractory — depends 
on  the  transmembrane  potential.  It  is  of  fundamental  interest,  for  example,  to  under¬ 
stand  what  structural  changes  and  processes  result  in  changing  the  conductance  state. 
This  issue  has  been  addressed  in  an  interesting  way  by  Rubinson  [49]  who  modelled 
the  sodium  channel  opening/closing  equilibrium  of  squid  nerve  “as  a  charged  region 
of  a  macromolecule  moving  under  the  influence  of  the  applied  field  and  confined 
elastically  by  interconnection  with  other  masses.”  The  result  was  the  characterization 
of  the  mechanical  properties  of  the  polypeptide  chain  segment  which  controlled  the 
gating  process  as  rubber-like  with  an  elastic  modulus  in  the  range  of  that  of  elastin. 
Taking  the  elastic  modulus  to  be  5  x  106  dynes/cm2  as  for  elastin,  the  ratio  of  the 
cross-sectional  area  to  length  (—400 A)  of  the  connecting  chain  segment  would  not 
be  unlike  that  of  the  polypentapeptide  /3-spiral  in  Figure  3.  This  is  not  to  imply  in  any 
way  that  a  /3-spiral  like  that  of  the  polypentapeptide  of  elastin  actually  exists  in  the 
sodium  channel  but  rather  to  emphasize  that  internal  chain  dynamics  and  specifically 
librational  processes  rather  than  random  chain  networks  would  be  required  to  under¬ 
stand  this  elastomeric  process. 

Enzyme  Mechanisms 

Several  aspects  of  enzyme  mechanisms  may  involve  entropic  elastomeric  forces 
within  the  protein,  for  example,  the  structural  rearrangements  resulting  from  the 
binding  of  an  allosteric  effector  [50],  induced  fit  elements  of  substrate  binding  [51], 
and  the  catalytic  process  itself.  In  the  former  two  processes  it  is  apparent  that  binding 
to  the  surface  of  a  viscoelastic  protein  could  result  in  stretch-damping  of  libra¬ 
tional  motions  within  proximal  regions  of  the  active  site.  In  addition  the  catalytic  pro¬ 
cess  itself  has  been  considered  in  terms  of  elastic  forces,  recalled,  for  example,  is 
the  elastomeric  “rack”  of  Eyring  et  al.  [52].  A  recent  elegant  description  of  this  ele¬ 
ment  of  enzyme  catalysis  has  been  presented  Gavish  [53]  in  an  exposition  of 
“molecular  dynamics  and  the  transient  strain  model  of  enzyme  catalysis.”  With  em¬ 
phasis  on  the  viscoelastic  properties  of  proteins  [54],  Gavish  described  a  detailed 
model  for  stress  and  strain  in  the  enzyme-substrate  complex.  The  protein  exerts  an 
elastic  force  on  the  scissile  bond  of  the  substrate  resulting  in  a  strain  that  contributes 
to  the  potential  energy  required  for  bond  cleavage.  An  effective  means  of  increasing 
the  rate  of  the  catalytic  process  would  seem  to  be  to  employ  an  entropic  elastomeric 
force  to  induce  strain  in  a  substrate.  Gavish  states  [53]  "factors  that  dominate  struc¬ 
tural  mobility  in  proteins  should  affect  enzyme  catalysis.”  On  the  basis  of  the  new 
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understanding  of  entropic  protein  elasticity  it  might  be  said  that  factors  that  modulate 
entropic  elastomeric  force  should  modulate  enzyme  catalysis.  For  entropic  elas¬ 
tomeric  force  as  demonstrated  by  the  polypentapeptide  of  elastin,  it  is  not  mobility 
per  se,  but  rather  it  is  mobility  arising  from  a  regularity  of  structure  that  gives  rise 
to  force  capable  of  inducing  significant  strain.  As  shown  by  the  nuclear  magnetic 
resonance  (NMR)-derived  rotational  correlation  times  [15],  the  mean  mobility  of  the 
peptide  moieties  is  greater  at  25°C  before  the  inverse  temperature  transition  than  at 
37°C  after  the  inverse  temperature  transition,  yet  the  entropic  elastomeric  force  is 
minimal  at  25°C  and  dramatically  increases  until  37°C  [15].  Thus  it  is  not  motion 
per  se  but  the  nature  of  the  motion.  In  the  dielectric  relaxation  studies  at  25°C  there 
is  no  localized  relaxation  in  the  1  GHz  to  1  MHz  frequency  range,  but  as  the  tempera¬ 
ture  is  raised  to  40°C  there  develops  in  concert  with  the  development  of  elastomeric 
force  an  intense,  Debye-type  relaxation  near  20  MHz,  indicating  motion  within  a 
regular  structure  [36].  Thus  it  is  coherent  motion  (e.g.,  a  librational  mode)  within  a 
regular  structure  that  gives  rise  to  entropic  elastomeric  force.  This  provides  for  an 
anisotropic  structure  capable  of  producing  a  strain  in  an  enzyme  substrate  by  means 
of  an  entropic  elastomeric  force. 

Modulation  of  Transitions  in  the  Elastomeric  State:  Turning 
Entropic  Elastomeric  Force  On  and  Off 

In  the  preceding  discussion  of  elastomeric  processes  in  protein  systems  it  was  gen¬ 
erally  the  elastomeric  state  itself  that  was  considered,  but  the  modulation  of  the  tran¬ 
sition  to  and  from  the  elastomeric  state  can  be  an  effective  means  of  turning  on  and 
off  an  entropic  elastomeric  force.  The  modulation  can  be  biochemical  and  it  can  be 
involved  in  such  disparate  processes  as  the  modulation  of  enzyme  catalysis,  wound 
repair,  the  destruction  of  elastic  tissue  in  environmentally  induced  lung  disease,  and 
relaxin-induced  cervical  ripening  and  pubic  ligament  formation  attending  parturition 
and  their  reversal. 

Elastogenesis 

Before  addressing  the  more  biomedical  issues,  it  is  necessary  to  consider  the  impli¬ 
cations  arising  from  the  fact  that,  for  elastin  and  the  polypentapeptide  of  elastin,  elas¬ 
togenesis  arises  out  of  an  inverse  temperature  transition  and  is  therefore  dependent  on 
the  hydrophobicity  of  the  chains  which  are  to  constitute  the  elastomer.  Generally, 
elastogenesis  of  elastin  has  been  considered  to  be  the  physical  process  of  fiber  forma¬ 
tion  but  as  will  be  seen  below  it  is  simultaneously  fiber  formation  and  the  develop¬ 
ment  of  elastomeric  force.  This  is  not  possible  within  the  constraints  of  the  classical 
theory  of  rubber  elasticity  requiring,  as  it  does,  Tandom  chain  networks,  because  the 
formation  of  an  isotropic  random  chain  network  could  not  result  in  the  formation  of 
anisotropic  fibers.  Once  the  random  chain  network  perspective  is  set  aside,  it  be¬ 
comes  apparent  that  modulation  of  elastomeric  force  in  homoiothermic  animals  can 
be  achieved  by  shifting  the  temperature  range  in  which  the  inverse  temperature  tran¬ 
sition  occurs. 

Effect  of  Changing  the  Hydrophobicity.  Using  the  polypentapeptide  of  elastin 
as  the  model  elastomer,  analogs  can  be  prepared  in  which  the  hydrophobicity  of 
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the  repeating  unit  is  changed.  Three  physical  characterizations  can  be  compared: 
(1)  the  temperature  profile  for  aggregation,  which  is  actually  the  temperature  profile 
for  fiber  formation,  (2)  the  temperature  dependence  of  conformational  change  fol¬ 
lowed  by  circular  dichroism,  and  (3)  the  temperature  dependence  of  elastomeric  force 
of  the  y-irradiation  cross-linked  analog  which  has  been  stretched  to  a  fixed  length 
at  40°C.  As  shown  in  Figure  7,  these  transitions  occur  near  30°C  for  (Val'-Pro’-Gly3- 
Val4-Gly5)n,  the  polypentapeptide  of  elastin.  When  the  hydrophobicity  of  the  repeat¬ 
ing  unit  is  increased  as  in  (lle'-Pro2-Gly3-Val4-Gty5)n ,  the  lie1  polypentapeptide,  the 
temperature  of  the  transition,  as  followed  by  all  three  means,  shifts  to  lower  tempera¬ 
ture  by  some  20°C  to  near  10°C  [55].  When  the  hydrophobicity  of  the  repeating  unit 
is  decreased  as  in  (Val'-Pro2-Gly3-Gly4)w  where  the  Val4  residue  has  been  deleted,  the 
temperature  of  the  transition  shifts  some  20°C  higher  to  a  temperature  near  50°C  J56). 
These  shifts  are  proportional  to  the  hydrophobicity  of  the  repeating  unit  as  estimated 
by  the  Nozaki  and  Tanford  [57]  and  the  Bull  and  Breese  [58]  scales.  This  reaffirms  the 
transition  to  be  an  inverse  temperature  transition,  with  a  temperature  inversely  pro¬ 
portional  to  the  hydrophobicity  of  the  repeating  unit.  It  is  to  be  emphasized  that  the 
transition  for  the  development  of  elastomeric  force  follows  the  hydrophobicity  shifts; 
this  further  reaffirms  development  of  elastomeric  force  to  be  the  result  of  an  inverse 
temperature  transition  leading  to  increased  order  for  the  elastomeric  state  [55,56]. 

Effect  of  the  Transition  on  the  Length  of  the  Elastomer.  The  steepness  of  the 
curve  for  the  development  of  elastomeric  force  of  the  Ile'-polypentapeptide  near  10°C 
[see  Fig.  7(C)]  is  the  result  of  matrix  shortening  and  the  fact  that  this  sample  had 
been  stretched  to  40%  elongation  at  40°C  whereas  the  other  samples  had  been 
stretched  to  60%  elongation  at  40°C.  As  reflected  in  the  temperature  profiles  of  ag¬ 
gregation,  the  noncross-linked  polypeptide  is  soluble  in  all  proportions  at  a  tempera¬ 
ture  below  the  onset  of  the  inverse  temperature  transition  [  19] .  This  means  that  the 
cross-linked  elastomers  would  dissolve  on  lowering  the  temperature  below  the  transi¬ 
tion  if  it  were  not  for  the  cross-links.  Instead  of  dissolving,  the  cross-linked  polypep¬ 
tides  simply  swell  to  the  limit  allowed  by  the  cross-links  and  by  the  structural 
transition.  This  results  in  remarkable  changes  in  the  length  of  the  cross-linked  matrix 
as  shown  in  Figure  8  where  the  length  is  measured  as  a  function  of  temperature  under 
zero  load  [59],  For  20  Mrad  cross-linked  polypentapeptide,  the  length  of  a  strip  of 
matrix  increases  2.2  fold  as  the  temperature  is  decreased  from  40  to  20°C.  Elastin 
shows  analogous  but  less  dramatic  lengthening;  a  classical  rubber  such  as  latex,  of 
course,  shortens  on  lowering  the  temperature  under  zero  load. 

Biochemical  Modulation  of  Hydrophobicity  (i.e.,  of  Transition  Temperature). 
Rather  than  decreasing  the  temperature  to  relax  the  elastomeric  force,  it  is  possible  to 
modify  enzymatically  the  hydrophobicity  of  the  elastomeric  polypeptides  and  thereby 
to  shift  the  temperature  of  the  inverse  temperature  transition.  This  shift  in  tempera¬ 
ture  of  the  inverse  temperature  transition  has  been  demonstrated  with  the  enzyme  pro¬ 
lyl  hydroxylase.  As  shown  in  Figure  9,  when  the  polypentapeptide  is  exposed  to 
prolyl  hydroxylase  with  the  resulting  hydroxylation  of  some  of  the  Pro  residues,  this 
decrease  in  hydrophobicity  causes  the  temperature  profile  for  aggregation  [60], 
equivalently  for  fiber  formation  and  for  elastomeric  force  development,  to  shift  to 
higher  temperature.  This  shift  occurs  with  only  about  one  Pro  in  10  hydroxylated;  this 
is  only  1  hydroxylation  in  50  residues.  Thus  enzymatic  prolyl  hydroxylation  with  a 


Figure  7.  Comparison  of  a  series  of  studies  on  a  related  series  of  elastomeric  sequential 
polypeptides:  Ile'-PPP  is  (IIel-Pro2-GIy3-VaI4-Gly5)„;  PPP  is  the  polypentapeptide  of  elastin, 
(Val'-Pro2-Gly3-Val4-Gly5),,;  and  PTP  is  (Val'-Pro2-Gly3-Gly4)„.  These  are  all  high  polymers 
with  molecular  weights  greater  than  50,000.  (A)  Temperature  profiles  for  aggregation  which 
have  been  shown  to  be  temperature  profiles  for  fiber  formation,  that  is,  fiber  formation  oc¬ 
curs  by  an  inverse  temperature  transition  utilizing  intermolecular  hydrophobic  interactions. 
Increasing  the  hydrophobicity  of  the  repeating  unit  as  in  lle'-PPP  causes  the  transition,  i.e.. 
fiber  formation,  to  occur  at  lower  temperature  than  for  PPP;  lie  is  more  hydrophobic  than 
Val.  Decreasing  the  hydrophobicity  of  the  repeating  unit  as  in  PTP  causes  the  aggregations, 
i.e.,  fiber  formations,  to  occur  at  higher  temperature.  (B)  The  conformation  of  each  of  the 
sequential  polypeptides  is  followed  by  circular  dichroism  of  suspensions  wherein  the  con¬ 
centration  was  kept  low  enough  so  that  the  particulate  distortions  due  to  the  small  suspended 
aggregates  were  not  significant.  Observed  in  each  case  is  an  increase  in  intramolecular  order 
as  the  temperature  is  raised  through  the  transition.  (C)  Temperature  dependence  of  elas¬ 
tomeric  force,  when  the  y-irradiation  cross-linked  coacervates  are  set  at  a  fixed  extension,  is 
followed.  The  development  of  elastomeric  force  is  found  to  have  shifted  to  the  temperature 
range  of  the  inverse  temperature  transition.  This  is  a  clear  demonstration  that  elastomeric 
force  develops  as  the  result  of  an  inverse  temperature  transition  dependent  on  the  hydro¬ 
phobicity  of  the  polypeptide.  The  elastomeric  state  is  the  more-ordered  state  and  loss  of 
elastomeric  force  can  be  achieved  by  decreasing  order.  The  temperature  range  of  the  inverse 
temperature  transition  can  be  shifted  by  changing  hydrophobicity  of  the  polypeptide.  If  the 
temperature  range  of  the  transition  could  be  reversibly  shifted  at  body  temperature  then  elas¬ 
tomeric  force  could  be  turned  on  and  off.  Adapted  with  permission  from  Refs.  55  and  56. 
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Figure  8.  Effect  of  inverse  temperature  transition  on  the  length  of  the  elastomer.  On 
raising  the  temperature  from  20  to  40°C  the  20  MRad  y- irradiation  cross-linked  polypen- 
tapeptide  of  elastin,  XM-PPP  (• — •),  undergoes  a  dramatic  shortening  to  45%  of  its  20°C 
length.  This  study  is  carried  out  at  zero  load  (zero  force).  The  structuring  that  occurs  during 
the  inverse  temperature  transition  to  form  the  0-spiral  type  of  structure  results  in  a  shorten¬ 
ing  of  the  strip  of  Xm-PPP.  A  similar  but  less  dramatic  and  more  gradual  shortening  is  ob¬ 
served  for  bovine  ligamentum  nuchae  elastin  (□  —  □).  Typical  of  rubbers,  latex  (O  —  O) 
expands  on  raising  the  temperature.  Thus  elastomeric  force  is  lost  in  part  due  to  the  struc¬ 
tural  transition.  If  by  making  the  polypeptide  less  hydrophobic,  the  transition  temperature 
range  should  shift  to  higher  temperature  and  the  elastomer  would  at  body  temperature 
lengthen  and  release  or  relax  the  force  between  two  contact  points.  Reproduced  with  per¬ 
mission  from  Ref.  59. 


sample  of  X20-PPP  held  extended  at  37°C  should  result  in  a  decrease  in  elastomeric 
force  when  held  at  constant  length  and  an  elongation  of  the  sample  when  maintained 
at  a  constant  force. 

While  hydroxylation  is  an  irreversible  process,  it  becomes  a  trivial  conceptual  step 
to  consider  an  elastomer  with  occasional  serine  or  threonine  residues  that  could  be 
phosphorylated  by  a  kinase  causing  the  elastomer  to  extend  (i.e. ,  to  relax)  and  that 
could  be  dephosphorylated  by  a  phosphatase  causing  the  elastomer  to  shorten  and 
elastomeric  force  to  again  develop.  It  is  suggested  that  such  processes  could  be 
involved  in  the  relaxin-induced  cervical  ripening  and  interpubic  ligament  formation 
and  their  reversal  after  parturition.  Phosphorylation  of  enzymes  and  other  proteins 
such  as  channels  could  be  expected  to  have  analogous  effects  on  polypeptide  seg¬ 
ments  capable  of  exerting  entropic  elastomeric  force. 

Biomedical  Relevance 

Wound  Repair.  In  scar  tissue  there  is  a  preponderance  of  collagen  fibers  with  few 
or  no  elastin  fibers  [61].  In  optimizing  wound  repair  which  involves  sewing  the 
breach  together  with  collagen  fibers,  high  levels  of  prolyl  hydroxylase  occur.  Hy¬ 
droxylation  of  proline  residues  in  collagen  is  necessary  for  release  of  collagen  from 
the  cell;  it  is  required  to  stabilize  the  collagen  triple-stranded  helix,  and  it  protects 
collagen  from  nonspecific  proteolysis  (see  references  within  Ref.  62).  The  same  en- 
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Figure  9.  Prolyl  hydroxyiation  of  the  polypentapeptide  of  elastin  by  the  enzyme  prolyl  hy¬ 
droxylase  decreases  the  hydrophobicity  of  the  polypeptide  and  shifts  the  temperature  range 
for  the  inverse  temperature  transition  10°C  to  higher  temperatures.  Using  synthetic  polypen¬ 
tapeptide  in  which  10%  of  the  pentamers  contained  hydroxyproline  instead  of  proline  causes 
a  similar  shift.  Of  the  order  of  one  hydroxyl  introduced  in  50  residues  causes  a  substantial 
shift  in  the  transition,  as  much  as  10°C.  Considered  in  terms  of  Figure  7(C),  this  would  shift 
the  development  of  elastomeric  force  to  a  higher  temperature.  Considered  in  terms  of  Fig¬ 
ure  8,  this  prolyl  hydroxyiation  would  at  37°C  result  in  a  lengthening  of  the  elastomer.  Thus 
an  enzymatic  modification  is  expected  to  caOse  a  relaxation  of  elastomeric  force  at  body 
temperature.  If  the  enzymatic  modification  were  phosphorylation  and  dephosphorylation 
then  entropic  elastomeric  force  could  be  turned  off  and  on  as  desired  for  changing  structural 
states  in  connective  tissue  and  elastomeric  components  of  muscle  or  for  changing  the  func¬ 
tional  state  of  an  enzyme  or  channel,  for  example. 


zyme  hydroxylates  proline  residues  in  tropoelastin,  the  single  precursor  protein  of 
elastin  fibers.  Based  on  the  shift  to  higher  temperatures  of  the  temperature  profile  for 
fiber  formation  of  the  polypentapeptide  of  elastin  (see  Fig.  9)  that  results  from  prolyl 
hydroxyiation,  this  decrease  in  hydrophobicity  of  tropoelastin  would  be  expected  to 
have  a  similar  effect.  The  result  would  be  less  elastic  fiber  formation  and  the  fiber 
formed  would  be  in  a  more  nearly  relaxed  state  and  unable  to  provide  an  appropriate 
entropic  elastomeric  restoring  force.  This  has  been  demonstrated  in  cell  cultures  of 
aortic  smooth  muscle  cells  induced  to  high  levels  of  hydroxyiation  by  the  addition  of 
ascorbic  acid  required  by  prolyl  hydroxylase  [63]. 

Environmentally  Induced  Lung  Disease.  In  environmentally  induced  lung  dis¬ 
ease,  such  as  pulmonary  emphysema,  the  elastin  fibers  are  fragmented  and  dysfunc¬ 
tional.  When  the  lung  is  challenged  by  toxic  substances,  it  is  proposed  that  the 
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ensuing  repair  response  results  in  the  elaboration  of  high  levels  of  prolyl  hydroxylase. 
The  consequence  of  overhydroxylation  of  tropoelastin  would  limit  elastin  fiber  for¬ 
mation;  those  fibers  that  did  form  would  be  able  to  exert  a  more  limited  elastomeric 
function  because  of  the  shift  to  higher  temperature  of  the  inverse  temperature  transi¬ 
tion;  and  it  is  not  unreasonable  to  expect  that  the  poorly  formed  fibers  would  le  more 
susceptible  to  proteolytic  degradation  [62].  In  general,  any  process,  such  as  inhala¬ 
tion  of  cigarette  smoke,  that  resulted  in  oxidation  of  the  elastomeric  chains  in  elastin 
would  cause  a  loss  of  elastic  recoil. 

Events  Attending  Parturition  and  Their  Reversal.  Interpubic  Ligament  Forma¬ 
tion.  There  are  remarkable  deformations  and  restoring  forces  attending  and  following 
parturition.  In  mice  and  guinea  pigs  (64, 65]  and  in  some  women  there  is  the  devel¬ 
opment  of  an  interpubic  ligament  in  the  days  prior  to  delivery.  In  mice,  for  example, 
the  pubic  symphysis  is  normally  less  than  2  mm  in  width.  In  the  days  before  delivery 
an  interpubic  ligament  develops  becoming  5-6  mm  in  length  allowing  for  enlarge¬ 
ment  of  the  birth  canal.  By  the  third  or  fourth  day  after  delivery  the  gap  between  the 
pubic  bones  is  drawn  back  to  2  mm  [65],  What  connective  tissue  processes  could  al¬ 
low  this  elongation,  and  then  within  the  time  period  of  a  few  days  what  restoring 
forces  could  result  in  the  shortening?  The  above  mentioned  biochemical  process  of 
decreasing  the  hydrophobicity  by  phosphorylation  could  lead  to  lengthening  by  shift¬ 
ing  of  the  temperature  range  of  the  inverse  temperature  transition  for  the  development 
of  elastomeric  force  to  higher  temperature.  The  result  would  be  a  biochemically  con¬ 
trolled  relaxation  of  elastomeric  force.  Subsequent  removal  of  the  phosphate  moieties 
by  phosphatases  would  result  in  a  restoration  of  elastomeric  force  and  a  shortening  of 
the  elastomer.  Interestingly,  the  shortening  from  about  5  mm  to  2  mm  is  similar  to 
the  shortening  of  the  cross-linked  polypentapeptide  (seen  in  Fig.  8)  on  going  from  the 
relaxed  state  at  20°C  to  the  elastomeric  state  at  37°C.  A  20°C  increase  in  the  tempera¬ 
ture  range  of  the  inverse  temperature  transition  by  decreasing  hydrophobicity  due  to 
phosphorylation  could  result  in  the  lengthening  and  then  dephosphorylation  could  re¬ 
turn  the  transition  temperature  to  its  normal  physiological  range  being  completed  as  it 
is  just  at  body  temperature. 

Cervical  Ripening.  The  relaxing  and  softening  of  the  cervix  is  referred  to  as  cervi¬ 
cal  ripening.  This  occurs  in  the  hours  preceding  delivery  and  is  thought  to  be  under 
the  control  of  the  hormone  relaxin  [66-68],  Here  one  could  employ  elastin  fibers  as 
considered  for  the  interpubic  ligament  formation.  However,  if  uterine  smooth  muscle 
fibers  contained  elastomeric  filaments  as  observed  in  striated  muscles,  then  phos¬ 
phorylation  and  dephosphorylation  of  intracellular  elastomeric  filaments  could  readily 
be  considered  as  a  potential  mechanism.  This  is  a  particularly  attractive  hypothesis  as 
the  mechanism  of  action  of  relaxin  is  considered  to  involve  the  activation  of  kinases 
and  phosphatases  in  a  time-dependent  manner  [69].  Once  such  a  hypothesis  is  raised 
involving  uterine  smooth  muscle  cells  it  is  natural  to  inquire  whether  such  a  process 
could  be  operative  in  vascular  smooth  muscle  cells  and  be  relevant  to  some  forms  of 
essential  hypertension. 

Requiem  for  the  Random  Chain  Network  Theory  of  Entropic  Protein  Elasticity 

One  of  the  purposes  of  the  above  limited  enumeration  of  the  possible  roles  of  en- 
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tropic  elastomeric  force  in  protein  structure  and  function  is  to  demonstrate  the  reason¬ 
ing  that  becomes  possible  once  the  shackles  of  the  classical  theory  of  rubber  elasticity 
(requiring  as  it  does  random  chain  networks)  are  removed  from  consideration  of  en- 
tropic  protein  elasticity.  Useful  approaches  of  three  decades  ago  should  give  way  to 
more  accurate  descriptions,  made  possible  by  improvements  in  physical  methods  and 
their  interpretation.  These  more  correct  descriptions  can  lead  to  new  contributions,  to 
new  concepts  of  mechanism  that  can  be  tested  by  a  wide  range  of  experimental  ap¬ 
proaches.  It  is  pernicious  to  hold  that  polypeptide  backbone  motions  of  the  order  of 
nanoseconds  can  only  be  achieved  by  random  chain  networks.  It  is  contrary  to  pro¬ 
gress  in  understanding  protein  structure  and  function  to  assume  that  the  only  exam- 
p'es  of  ordered  polypeptide  states  are  a-helix,  /3-sheet,  and  triple-standed  helix  and 
that  all  else  is  random.  It  is  particularly  curious  to  see  protein  structure  deduced  on 
the  basis  of  a  theoretical  approach  that  has  found  it  necessary  to  invoke  phantom 
chains  that  occupy  no  space  and  that  can  pass  through  one  another  [70].  Once  the 
random  chain  network  theory  of  entropic  protein  elasticity  is  set  aside,  progress  in 
understanding  many  fundamental  processes  utilizing  entropic  protein  elasticity  can 
occur  more  readily. 
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Abstract 

The  scope  of  multistep  modeling  (msm)  is  expanded  by  adding  a  least-squares  minimization  step  in  the 
procedure  to  fit  backbone  reconstruction  consistent  with  a  set  of  C-alpha  coordinates.  The  analytical  solu¬ 
tion  of  Phi  and  Psi  angles,  that  fits  a  C-alpha  x-ray  coordinate  [1]  is  used  for  tyr-tRNA  synthetase.  Phi  and 
Psi  angles  for  the  region  where  the  above  mentioned  method  fails,  are  obtained  by  minimizing  the  differ¬ 
ence  in  C-alpha  distances  between  the  computed  model  and  the  crystal  structure  in  a  least-squares  sense. 
We  present  a  stepwise  application  of  this  part  of  msm  to  the  determination  of  the  complete  backbone 
geometry  of  the  321  N  terminal  residues  of  tyrosine  tRNA  synthetase  to  a  root  mean  square  deviation  of 
0.47  A  from  the  crystallographic  C-alpha  coordinates. 

Introduction 

Our  recent  papers  [2, 3]  concern  the  prediction  of  macromolecular  structure  from 
partial  experimental  data,  for  example,  medium  resolution  crystallographic  C-alpha 
coordinates.  Purisima  and  Scheraga  [1]  approached  this  problem  by  using  analytical 
method  to  find  the  set  of  Phi  and  Psi  angles  of  the  polypeptide  backbone  when 
threaded  through  the  C-alpha  trace.  These  authors  pointed  out  that  the  method  fails 
under  certain  conditions.  Purisima  and  Scheraga  [1]  used  a  least-squares  fitting  of  the 
C-alpha  atoms  of  the  polypeptide  chain  onto  the  C-alpha  atoms  of  the  crystal  struc¬ 
ture  in  the  regions  where  the  analytical  solution  failed  [4]  by  optimizing  a  function 

N 

i=i 

where  N  is  the  number  of  guide  points  (C-alpha  atoms  in  this  case)  and  a,  =  r,  -  r,0. 
In  the  fixed  coordinate  system,  the  position  of  any  atom  with  respect  to  the  origin  is 
represented  by  vectors  rK  and  r^,  respectively.  Where  the  former  corresponds  to  the 
i-th  atoms  in  the  computed  conformation  and  the\ latter  position  in  the  x-ray  structure. 
Our  attempt  to  use  the  analytical  method  to  refine  the  C-alpha  trace  of  321  N  terminal 
residues  of  tyr-tRNA  synthetase  gave  Phi  and  Psi  values  for  85%  of  the  residues. 
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This  paper  presents  the  use  of  least-squares  minimization  of  the  respective  dis¬ 
tances  between  C-alpha  atoms  in  the  model  polypeptide  and  the  crystallographic 
guide  points  to  obtain  the  backbone  geometry  for  the  rest  of  the  15%  residues  consis¬ 
tent  with  the  rest  of  the  backbone.  We  further  refine  the  model  by  computer  graphics 
and  energy  optimization.  We  illustrate  our  results  in  the  following  sections  along  with 
a  brief  description  of  our  method  and  experience  gained  during  the  assignment  of  Phi 
and  Psi  angles  for  all  321  residues. 

Method  and  Results 

The  C-alpha  coordinates  of  the  321  N  terminal  residues  of  tyr-tRNA  synthetase  are 
obtained  from  Brookhaven  Data  Bank.  The  method  and  results  are  subdivided  into 
the  following  four  sections: 

Analytical  Method 

CONVRT  and  CONMAP,  developed  by  Purisima  and  Scheraga  [1],  are  two  pro¬ 
grams  (Document  No.  NAPS-04122)  obtained  from  ASIS  National  Auxiliary  Publi¬ 
cation  Service  which  give  all  possible  sets  of  Phi  and  Psi  angles  consistent  with  a 
given  set  of  C-alpha  coordinates.  A  score  of  one  is  assigned  to  all  the  Phi  and  Psi  an¬ 
gles  if  the  values  are  within  the  Ramachandran  allowed  region,  or  it  is  assigned  a 
score  of  zero.  The  sum  of  this  score  for  every  solution  is  then  used  as  a  criteria  to 
select  one  from  many  sets  of  Phi  and  Psi  satisfying  the  same  C-alphas.  Various  seg¬ 
ments  of  the  model  which  fit  on  the  C-alpha  trace  of  crystal  structure  are  shown 
in  Figure  1.  There  are  11  separate  regions  where  the  backbone  dihedral  angles  Phi 
and  Psi  remain  undetermined.  We  refer  to  these  regions  as  “gaps”  in  later  sections. 
Table  I  lists  the  first  and  last  residue  numbers  of  each  segment. 


f 


Table  I.  Segments  for  which  Phi  and  Psi  Angles  are  obtained  using  CONVRT 

and  CONMAP. 


Segment  no. 

N-terminal  residue  no. 

C-terminal  residue  no. 

1 

3 

20 

2 

21 

44 

3 

46 

54 

4 

57 

76 

5 

79 

81 

6 

83 

111 

7 

115 

190 

8 

192 

212 

9 

214 

225 

10 

232 

237 

11 

244 

292 

12 

294 

318 
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Figure  I .  Dotted  line  =  C-alpha  trace;  solid  line  =  backbone  segments  obtained  from 

analytical  method. 


Optimization 

The  various  segments  of  the  structure  for  which  Phi  and  Psi  angles  are  obtained 
from  the  previous  step,  lose  their  relative  spatial  orientation  when  arbitrary  values  are 
assigned  for  the  Phi  and  Psi  angles  of  gap  residues.  These  segments  on  both  sides  of 
each  gap  are  brought  to  their  relative  orientation  by  optimizing  a  minimal  set  of  dis¬ 
tances  between  the  C-alpha  atoms  in  the  model  to  the  corresponding  distances  in  the 
crystal  structure.  A  minimal  set  of  distances  is  a  subset  of  all  distances  between  the 
C-alpha  atoms  which  is  necessary  and  sufficient  to  preserve  a  3D  configuration.  This 
choice  is  based  on  the  concept  that  the  position  of  a  point  is  uniquely  defined  if  its 
distance  from  four  non-coplanar  points  is  known  {5].  This  is  a  special  case  of  more 
general  mathematical  conditions  for  the  contraction  of  dimensionality  and  is  stated  in 
more  physical  terms  by  Purisima  and  Scheraga  [6].  Gaps  are  treated  one  or  two  at  a 
time  depending  on  their  separation  along  the  chain.  The  variable  parameters  for  the 
optimization  are  the  Phi  and  Psi  angles  of  the  residues  in  the  gap  along  with  the  Phi 
and  Psi  angles  of  three  adjacent  residues  on  both  sides  of  the  gap.  The  set  of  dis¬ 
tances  chosen  are  the  distances  ">f  all  the  C-alpha  atoms  in  the  moving  segment  from 
four  nonplanar  C-alpha  atoms  of  the  fixed  segment.  Moving  segment  consists  of  ail 
the  C-alpha  atoms  on  the  C  terminal  side  of  the  gap  and  the  C-alpha  atoms  of  the 
residues  whose  Phi  and  Psi  angles  are  assigned  variables.  The  fixed  segment  is  the  N 
terminal  side  of  the  gap  under  consideration.  Figure  2  shows  the  fixed  and  moving 
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segments  in  our  definition.  The  function  to  be  optimized  is  given  by 

F  -  2  (D*  -  dt)2 

i=i 

where  M  is  the  number  of  distances  picked  by  the  criteria  described  above.  D  is  the 
vector  of  selected  distances  in  the  model  and  d  is  the  vector  of  corresponding  dis¬ 
tances  in  the  crystal  structure.  The  optimization  procedure  adopted  is  the  well  known 
damped  least-squares  procedure  of  Levenberg  [7].  The  iterative  equation  for  the  Mh 
iteration  is  given  by 

</><,+ 1 1  -  4>i  =  (JJ  ~  pi  V'  JFk 

where  d>  is  the  variable  parameter,  J  is  the  rectangular  matrix  ( M  x  N)  of  derivative 
8F/8d>,  I  the  square  identity  matrix  of  order  (N  x  N),  p  is  a  damping  constant, 
and  N  is  the  number  of  variables. 

The  optimization  of  7  of  1 1  gaps  closed  rapidly,  showing  steady  convergence  of 
the  standard  derivation.  However,  two  other  segments  gave  local  mirror  structure 
near  the  gap.  This  results  because:  (1)  the  starting  geometry  is  far  from  minimum, 
and  (2)  the  choice  of  the  four  fixed  C-alphas  on  the  N  terminal  segment  may  be  very 
close  to  each  other.  However,  this  problem  is  readily  overcome  by  using  the  negative 
of  the  Phi  and  Psi  angles  obtained  from  the  ill  optimized  structure  for  residues  where 
the  chain  direction  is  opposite.  This  set  of  changed  Phi  and  Psi  angles  is  then  used  as 
a  starting  model  for  closing  the  gaps.  Figure  3  shows  the  result  of  this  step. 
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Computer  Graphics  Modeling 

The  remaining  two  gaps,  with  6  continuous  residues  with  undefined  Phi  and  Psi 
angles  pose  problems  in  convergence.  These  segments  are  modeled  using  our  graph¬ 
ics  program  (moses)  [8]  and  then  used  as  an  input  for  optimization  step.  Figure  4 
shows  the  juxtaposed  structure  of  this  region  before  and  after  modeling. 

The  set  of  Phi  and  Psi  angles  for  all  the  321  residues  obtained  by  the  application  of 
the  above  steps  gave  a  root  mean  square  deviation  of  1.7  A  when  least-squares  juxta¬ 
posed  on  the  crystal  structure.  This  structure  is  further  optimized  for  all  distances 
closer  than  10  A  in  the  crystal  C-alpha  coordinates  and  changing  all  319  Phi  and  Psi 
angles  of  the  built  model.  The  root  mean  square  improved  to  0.47  A.  Figure  5  shows 
the  complete  backbone  structure  juxtaposed  on  C-alpha  trace  of  the  crystal  structure. 

Energy  Minimization 

A  rough  analysis  of  the  set  of  Phi  and  Psi  angles  shows  that  15-20#  of  the 
residues  fall  outside  the  Ramachandran  allowed  region.  This  is  not  unexpected  as  it  is 
known  that  10%  of  the  Phi  and  Psi  angles  in  the  protein  crystal  structure  are  outside 
this  allowed  region.  In  crystal  structure  this  is  due  to  the  considerable  deviation  of  the 
peptide  geometry  from  standard  values.  Here  it  could  be  because  of  the  attempt  to  fit 
a  standard  geometry  to  3  A  resolution  structure.  Forty-five  cycles  of  energy  optimiza- 
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Figure  4.  Dotted  line  =  C-alpha  trace;  broken  line  =  model  with  two  gaps  of  length  of 
6  residues  whose  values  are  assigned  180-180;  and  solid  line  =  fitted  model  using  com¬ 
puter  graphics. 


tion  on  the  entire  backbone  with  Ala  in  place  of  all  other  residues  except  Gly  and  Pro 
have  been  performed.  The  purpose  of  this  step  is  to  remove  backbone  short  contacts 
which  may  exist  in  the  final  backbone  and  to  improve  the  Phi  values  of  the  prolines. 
The  position  of  the  C-alpha  atoms  were  kept  fixed  during  optimization.  In  spite  of  the 
constraint  on  the  C-alpha  atoms,  the  deviation  of  the  bond  lengths  and  bond  angles 
from  the  standard  ECEPP  geometry  is  <0.05  A  and  <10.0°,  respectively.  The  maxi¬ 
mum  out-of-plane  deformation  of  the  trans  peptide  bond  is  18  degrees,  while  only  11 
of  them  are  >10  degrees.  Figure  6  shows  the  final  energy  optimized  structure  fitted 
through  the  C-alpha  trace.  With  the  constraint  on  the  C-alpha  atoms  and  the  limited 
cycles  of  energy  optimization  the  value  of  the  6-12  components  of  the  total  energy  is 
78  Kcal  for  nearly  two  million  atom  pairs. 


Discussion 


Grafting  of  the  respective  side  chain  residues  instead  of  alanine  and  refinement  of 
the  full  structure  using  energy  optimization  is  in  progress.  Although  the  structure 
with  all  the  side  chains  is  not  yet  known,  we  use  the  developed  model  to  describe  the 
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topology  of  this  protein  following  the  description  of  Richardson  [9],  The  topological 
diagram  (Fig.  7)  of  the  beta-extended  and  alpha-helical  regions  resembles  that  of  di¬ 
hydrofolate  reductase  and  glutathione  reductase  [9].  Recognition  of  such  homologous 
feature  provide  additional  template  to  be  identified  in  proteins  and  can  be  used  in 
comparative  modeling. 
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Abstract 

A  survey  of  SO  protein  structures  (47  globular  and  3  fibrous)  indicates  that  intrahelical  ion  pairs  between 
oppositely  charged  residues  (GUT,  Asp'/Lys’ ,  Arg’)  3  or  4  residues  apart  along  the  helix  may  have  a 
stabilizing  effect  on  alpha  helices  exposed  to  solvent.  It  is  found  that  the  i,  i  ±  3/4  types  of  ion  pairs  arc 
the  most  predominant,  and  their  observed  frequencies  are  significantly  greater  than  their  expected  frequen¬ 
cies.  Such  a  preference  is  not  seen  for  the  like-charged  pairs  which  served  as  a  control.  It  was  found  that 
the  normalized  frequencies  of  these  ion  pairs  increased  with  the  helix  length.  An  analysis  of  the  distances 
between  the  charged  groups  in  ion  pairs  suggests  that  only  about  20%  of  the  ion  pairs  are  stabilized  by 
hydrogen  bonding  (salt  bridged),  about  40%  by  electrostatic  interactions,  and  the  remaining  may  be  stabi¬ 
lized  by  solvation:  forming  water  bridges  or  plumes  of  water  molecules  around  the  charged  groups.  The 
fibrous  proteins,  which  have  a  proportionately  larger  solvent  exposed  area  than  the  globular  proteins,  have 
a  higher  density  of  intrahelical  or  secondary  structural  ion  pairs.  They  are  distinguished  from  the  globular 
proteins  which  contain  fewer  ion  pairs/charged  residues  because  of  their  smaller  solvent  exposed  area.  The 
results  indicate  that  the  ion  pairs  may  have  a  stabilizing  effect  on  alpha  helices  exposed  to  solvent. 

Introduction 

One  of  the  surprising  features  of  the  dumbbell-shaped  molecular  structure  of  skel¬ 
etal  muscle  troponin  C  [1,2]  is  the  stability  of  the  solvent-exposed  alpha-helical 
“handle.”  It  is  found  that  the  amino  acid  composition  of  the  “handle”  is  rich  in 
charged  residues,  GUT,  Lys+,  Asp",  and  Arg+  [3],  Furthermore,  it  is  found  that 
there  are  several  ion  pairs  involving  oppositely  charged  residues  located  3  or  4 
residues  apart  along  the  helix  handle.  The  ion  pairs  may  be  visualized  to  be  involved 
in  (a)  direct  hydrogen  bonding  or  salt  bridges,  (b)  attractive  electrostatic  interactions, 
and  (c)  bridged  by  water  molecules  or  surrounded  by  plumes  of  water  molecules.  It  is 
surmised  that  the  juxtaposition  of  oppositely  charged  residues  or  ion  pairs  can  protect 
and  screen  the  solvent-exposed  helix  from  denaturation  or  “erosion”  [3]. 
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Results 

Fibrous  Proteins-Coiled  Coils 

Troponin  C  can  be  regarded  as  a  “fibroglobular”  structure  constituting  a  fibrous 
component,  (the  alpha-helical  “handle"),  and  two  globular  halves  (N  and  C  domains). 
The  high  density  of  charged  residues  and  ion  pairs  in  the  helix  “handle”  prompted  us 
to  analyze  the  amino  acid  sequences  of  the  fibrous  proteins.  The  double-helical  coiled 
coil  structures  are  an  important  class  of  fibrous  proteins  occurring  in  nature.  Some 
examples  are  alpha-tropomyosin  [4]  and  myosin  rod  [5]  from  skeletal  muscle,  and 
lamin  C  [6],  a  major  protein  of  the  nuclear  envelope.  An  intrinsic  feature  of  these  fi¬ 
brous  proteins  is  the  presence  of  a  large  number  of  intrahelical  ion  pairs  (Table  I).  In 
addition,  the  intrinsic  sequence  periodicity  of  the  coiled  coil  results  in  the  presence  of 
oppositely  charged  residues  at  i,  i  ±  S/7  residues  apart  which  allow  the  formation  of 
interhelical  salt  bridges  between  helices  [4, 5].  Apparently  the  need  for  the  numerous 
intrahelical  ion  pairs  arises  from  the  large  exposed  surface  area.  It  appears  that,  as  in 
the  case  of  the  troponin  C  “handle,”  the  large  concentration  of  intrahelical  ion  pairs 
stabilize  the  individual  alpha  helices  of  the  coiled  coil  from  “erosion”  [7],  while  the 
periodic  interhelical  salt  bridges  and  hydrophobic  interactions  stabilize  the  tertiary 
structure  of  the  coiled  coil  [4,5].  In  alpha-tropomyosin  there  are  68  ion  pairs  of  the 
type  i  ±  3/4,  constituting  24%  of  the  total  residues,  while  in  lamin  C  and  myosin 
rod  these  values  are  14%  and  22%,  respectively.  The  ion  pair  density  found  in  these 
fibrous  proteins  is  significantly  larger  than  that  found  in  globular  proteins  [7], 

Globular  Proteins 

The  analysis  was  also  extended  to  globular  proteins  [7],  A  total  of  47  globular 
proteins  from  the  Brookhaven  Protein  Data  Bank  were  sampled,  displaying  different 
tertiary  structures  and  functions.  The  proteins  were  searched  for  the  occurrence  of  op¬ 
positely  charged  pairs  as  well  as  like-charged  pairs  at  /, »  ±  1 , 2, ...  8  residues  apart 
in  the  alpha-helical  and  nonhelical  regions.  The  survey  of  the  nonhelical  regions  and 
like  pairs  served  as  a  control  for  this  study.  Possible  correlations  between  ion  pairs 
and  helix  length,  order  of  the  charged  residues  in  ion  pairs  and  its  relation  to  helix 


Table  I.  Fibrous  proteins. 


Proteins 

No.  of 
residues 

Oppositely  charged  pairs 

i  ±  I 

i  ±  2 

i  ±  3 

/  ±  4 

i  ±  5  i 

±  6 

i  ±  7 

i  ±  8 

Alpha  tropomyosin 

284 

26 

26 

34 

34 

38 

25 

39 

29 

Lamin  C 

572 

29 

29 

42 

41 

52 

42 

48 

37 

Myosin  rod 

1094 

63 

68 

102 

144 

81 

84 

103 

92 

Like  charged  pairs 

Alpha  tropomyosin 

31 

28 

28 

28 

20 

29 

41 

28 

Lamin  C 

41 

48 

33 

30 

33 

24 

31 

31 

Myosin  rod 

94 

106 

82 

62 

63 

84 

103 

78 

-ft 
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dipole  were  also  investigated.  The  distances  between  the  charged  groups  in  the  ion- 
ion  pairs  (opposite  and  like  pairs)  were  also  determined  to  evaluate  the  nature  and 
strengths  of  the  interactions  between  the  pairs. 

Of  a  total  of  299  alpha  helices  in  the  47  globular  proteins  considered,  163  helices 
contain  either  an  ion  pair,  like  pair,  or  both.  The  remaining  136  helices  do  not  con¬ 
tain  any  charged  pairs  and  are  generally  found  buried.  Of  the  163  helices,  135  con¬ 
tain  at  least  one  ion  pair  while  90  contain  at  least  one  like  pair.  There  are  73  helices 
with  ion  pairs,  while  there  are  only  28  helices  with  like  pairs.  The  distribution  of 
helix  lengths  is  given  in  (Fig.  1).  The  frequency  of  occurrence  of  the  i,  /  ±  1 , 2, 3, 4 
...  8  types  of  ion-ion  pairs  are  shown  in  Table  II.  It  is  seen  that  the  /  ±3  and  i  ±  4 
types  of  ion  pairs  juxtaposing  on  the  same  side  of  the  helix,  are  the  most  predomi¬ 
nant,  followed  by  i  ±  1  and  then  i  ±  2,  in  which  the  residues  are  on  the  opposite 


Figure  1.  The  distribution  of  helical  segments  as  a  function  of  their  lengths.  The  mean 
length  of  the  helices  is  about  3-4  turns. 


Table  II.  Globular  proteins. 


Type  of 
pair 

Oppositely 

charged 

pairs 

Like 

charged 

pairs 

i,  i  ±  i 

95(100) 

118(100) 

»,  i  ±  2 

70(88) 

75(85) 

i,  i  ±  3 

120(74) 

76(79) 

i,  i  ±4 

128(66) 

73(77) 

i,  i  ±  5 

56(54) 

58(61) 

i,  i  ±  6 

62(50) 

53(52) 

i,i±  7 

59(40) 

62(42) 

i,  i  ±  8 

55(30) 

46(34) 

Expected  numbers  are  given  in  parenthesis. 
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sides  of  the  helix.  There  appears  to  be  a  slight  preference  for  the  /  ±4  type  over  the 
i  ±  3.  The  frequencies  of  occurrence  of  i  ±  1, 2, 3 ...  8  types  of  pairs  on  beta 
strands  and  turns  (Fig.  2),  and  on  the  proteins  as  a  whole  (Fig.  3),  were  also  deter¬ 
mined.  It  is  found  that  the  beta  strands  and  turns  showed  no  preference  for  the 
i  ±  3/4  type  of  ion  pairs.  This  is  to  be  expected  because  in  beta  strands  the  extended 
peptide  chain  places  the  side  chains  farther  apart.  The  normalized  frequencies  for  ion 
pairs  on  alpha  helices  were  also  determined  (Fig.  4).  It  is  found  that  the  longer  he¬ 
lices  contain  more  than  their  proportionate  number  of  ion  pairs.  However,  like  pairs 
do  not  show  this  property.  An  important  observation  is  that  the  observed  frequencies 
of  ion  pairs  are  significantly  greater  than  the  expected  frequencies  (Table  II). 

Some  examples  of  exposed  helices  (not  necessarily  amphiphilic)  with  ion  pairs  are 
presented  in  the  form  of  helical  wheels  in  Figure  5.  Trp  repressor  [8]  presents  a  strik- 


i±l  i±2  i±3  i±4  i±5  i±6  i±7i±8  itl  i±2  i±3 


Type  of  pair 

Figure  2.  Frequency  of  occurrence  of  (a)  like  pairs  and  (b)  ion  pairs  on  beta  strands  and 

turns. 
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ing  example  of  an  alphahelix  stabilized  by  ion  pairs.  The  29  residues  at  the  amino 
terminal  are  exposed  to  solvent.  The  First  11  residues  are  uncharged  and  nonhelical 
while  the  remaining  stretch  (12-29)  is  helical  and  endowed  with  four  contiguous 
i  ±  3  type  of  ion  pairs. 

Interaction  of  Ion  Pairs  With  the  Helix  Macrodipole.  It  is  found  that  the  fre¬ 
quency  of  occurrence  of  the  i  ±  1  type  of  like-charged  pairs  is  higher  than  the  corre¬ 
sponding  ion  pairs  by  19.5%  (Table  II).  A  total  of  37 %  ''of  these  like-charged  pairs 
are  situated  at  the  ends  of  the  helix.  The  presence  of  adjacent  negatively  charged 
residues  and  positively  charged  residues  at  the  amino  and  carboxyl  ends  of  the  helix, 
respectively,  is  favored  by  the  helix  dipole.  This  preference  is  consistent  with  an  ear¬ 
lier  observation  that  there  is  a  preference  for  a  single  appropriately  charged  residue  at 
the  helix  termini  [9, 10].  When  we  consider  the  three  terminal  residues  of  the  helices, 
it  is  found  that  the  termini  of  42%  of  the  alpha  helices  contain  the  appropriate 
charged  residues,  39%  contain  neutral  residues,  and  19%  contain  the  incorrectly 
charged  residues  violating  the  alpha  helix  dipole.  It  appears  then  that  the  helix  dipole 
is  not  a  major  factor  contributing  to  helix  stability. 

Of  the  four  possible  combinations  of  ion  pairs  involving  the  residues  Glu” ,  Asp' , 
Lys+,  Arg+;  the  Lys + -Glu ”/Glu  “ -Lys +  combination  is  the  most  favored.  Similarly, 
Arg+  prefers  association  with  Glu'  than  with  Asp*.  Glu”  shows  a  stronger  preference 
than  Asp”  to  occur  in  association  with  a  positively  charged  residue.  It  is  also  found 
that  in  general  the  negatively  charged  residue  precedes  the  positively  charged  residue. 


V) 

280 

- 

c 

5 

260 

- 

•*— 

o 

240 

_ 

w 

a 

220 

L 

a> 

n 

200 

. 

c 

180 

- 

(0 

w_ 

160 

- 

‘5 

a 

i 

140 

- 

c 

o 

120 

i 

<0 

100 

£ 

80 

O 

60 

Q> 

40 

-O 

E 

20 

<3 

z 

0 

ill  i±2  i±3  it4  its  it6  it7  it8 

Type  of  pair 


</> 

L- 

’3 

a 

i 

c 

o 


o 

k_ 

a> 

n 

e 

D 

z 


Type  of  pair 


Figure  3.  Frequency  of  occurrence  of  ion  pairs  (a)  in  the  proteins  as  a  whole,  unshaded 
part,  in  the  alpha  helices,  shaded  part  and  (b)  in  the  nonhelical  regions  which  was  obtained 
by  the  difference  between  ion  pairs  in  helices  and  in  proteins. 
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Number  of  helical  turns 


Figure  4.  The  normalized  frequencies  of  ion  pairs  in  alpha  helices.  The  normalized  fre¬ 
quencies  represent  the  density  of  ion  pairs  per  turn.  Notice  that  the  normalized  frequencies 
of  the  i  ±  3/4  ion  pairs  increase  with  the  helix  length.  Helices  with  three  turns  particularly 
show  a  higher  density  of  ion  pairs. 


Influence  of  Ion  Pairs/Charged  Residues  on  Helix  Stability.  It  is  found  that  about 
17%  of  the  ion  pair  distances  are  less  than  4  A  (hydrogen  bonded),  43%  at  distances 
ranging  from  4  to  7  A  (electrostatic  interactions/water  bridges),  and  the  remaining 
40%  are  beyond  7  A  (hydrogen  bonded  to  solvent).  The  like-charged  pairs  are  mainly 
found  at  distances  greater  than  4  A. 

It  is  known  that  the  solvation  energies  of  charged  residues  contribute  to  the  stabil¬ 
ity  of  a  molecular  structure  [11].  This  may  be  relevant  to  the  observation  that  shorter 
helices  can  “survive”  with  only  charged  residues  or  fewer  ion  pairs  [7]  while  the 
longer  helices  are  dependent  on  the  larger  concentration  of  ion  pairs.  The  ion  pairs  in 
the  exposed  alpha  helices  seem  to  have  increased  solvation  potentials  compared  to  the 
randomly  distributed  charged  residues  in  an  alpha  helix. 


STABILIZATION  OF  ALPHA-HELICES  BY  ION  PAIRS 


295 


Figure  5.  Alpha  helical  wheels  looking  from  carboxyl  lo  amino  end:  (a)  adenylate  kinase 
(residues  144-158),  6  ion  pairs,  (b)  myohemerythrin  (residues  69-87).  4  ion  pairs,  (c)  trp 
repressor  (residues  69-87),  5  ion  pairs,  and  (d)  troponin  C  (residues  84-101),  6  ion  pairs. 


Conclusion 

We  have  shown  that  in  exposed  alpha  helices  ion  pairs*  of  the  type  /,  i  ±  3.  and  i, 
i  ±  4  occur  with  greater  frequency  than  expected  [7],  It  appears  that  the  ion  pairs 
contribute  to  the  stability  of  the  exposed  alpha  helices  by  protecting  the  helix  back¬ 
bone  hydrogen  bonding  from  “erosion.”  The  ion  pairs  seem  to  stabilize  the  helices  by 


*The  ion  pairs  may  be  referred  to  as  secondary  ion  pairs  as  they  stabilize  the  alpha  helices,  while  the  ion 
pairs  that  stabilize  tertiary  structure  may  be  referred  to  as  tertiary  ion  pairs. 
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their  solvation  potentials,  hydrogen  bonding,  and  electrostatic  interactions  between 
the  oppositely  charged  residues.  Although  the  like  pairs  are  generally  fewer  in  num¬ 
ber,  they  may  also  have  a  stabilizing  effect  on  exposed  alpha  helices  because  of  their 
solvation  potentials.  The  greater  propensity  for  ion  pairs  results  from  the  possibility 
of  direct  hydrogen  bonding  and  electrostatic  interactions  between  the  oppositely 
charged  residues. 

It  appears  that  both  ion  pairs  [7]  and  hydrophobic  triplets  1 12]  have  a  stabilizing 
role  on  protein  secondary  structure;  a  buried  alpha  helix  contains  more  hydrophobic 
triplets  while  an  exposed  alpha  helix  contains  more  ion  pairs.  The  amphiphilic  helices 
will  be  stabilized  by  hydrophobic  triplets  on  the  buried  side  and  by  ion  pairs  on  the 
solvent  side.  Thus,  several  factors  seem  to  contribute  to  the  stability  of  alpha  helices; 
ion  pairs,  hydrophobic  interactions,  and  helix-dipole  interactions. 

Our  work  has  established  a  link  between  the  fibrous  and  globular  families  of 
proteins  in  that,  the  long  alpha  helices  of  fibrous  proteins  require  numerous  ion  pairs 
or  charged  residues,  while  the  shorter  alpha  helices  of  the  globular  proteins  require 
fewer  ion  pairs.  In  other  words,  the  greater  the  exposed  area  of  a  protein/helix  (elon¬ 
gated  molecule)  the  greater  is  the  ratio  of  charged  to  apolar  residues,  while  the  lower 
the  exposed  area  (globular  molecule)  the  lower  is  the  ratio  of  charged  to  apolar 
residues.  The  “fibroglobular”  (troponin  C)  family  of  proteins  belong  to  a  third  class 
which  share  some  features  in  common  with  the  globular  and  fibrous  proteins. 
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Abstract 

We  have  recently  discovered  a  new  entirely  unexpected,  and  highly  selective  protein-ligand  interaction. 
This  new  kind  of  molecular  interaction  was  recognized  by  chromatography  of  proteins  on  divinylsultonated 
agarose  gels  which  had  been  deactivated  using  mercaptoethanol.  The  essential  structure  of  the  interacting 
immobilized  ligand  is  quite  simple  and  nonionic.  It  can  be  generally  represented  by. 

agarose-OCH2CHrSOrCHrCH,-X- 

where  X  was  first  a  thioether  but  can  also  be  N.  0(S  >  N  O)  or  possibly  any  atom  with  at  least  one 
lone  electron  pair.  We  have  provisionally  termed  peptides  and  proteins  interacting  with  this  ligand 
"thiophilic,"  in  recognition  of  their  affinity  for  the  definitive  thioether  sulfone  constituents.  The  thiophilic 
adsorption  process  is  promoted  by  water-structuring  or  “anticbaotropic"  salts  such  a.  sulfates  or  phosphates 
and  would  appear  to  be  entropically  driven.  The  thermodynamics  of  such  a  process  are  discussed  relative 
to  protein  recognition  of  the  immobilized  thioether-sulfonc  ligand.  We  do  not  yet  know  the  precise  mecha¬ 
nism  for  the  interaction  but  we  believe  that  salt  allows  the  protein  into  close  contact  with  the  immobilized 
thioether-sulfone  group  where  short-range  forces  are  likely  to  be  important.  Evidence  suggests  that  aro¬ 
matic  side  chains  on  the  protein-binding  site  may  be  involved  and  we  therefore  expect  that  some  kind  of 
electron-donor-acceptor  or  proton-acceptor  mechanism  is  likely  involved.  Two  important  applications  of 
thiophilic  adsorption  are  the  selective  immobilization  of  functional  antibodies  as  well  as  purification  of  im¬ 
munoglobulins  from  screm,  ascites  fluid,  and  hybridoma  cell  culture  media  Monoclonal  antibodies  can  be 
purified  in  one  step  under  extremely  mild  (structure-stabilizing)  conditions.  We  therefore  consider  the  fur¬ 
ther  characterization  of  thiophilic  adsorption  of  major  significance  in  the  fields  of  immunology  and  bio¬ 
technology  and  hope  that  this  presentation  will  inspire  attempts  to  explain  the  interaction  in  terms  of 
quantum  chemistry. 


Introduction 

In  living  matter  all  information  transfer,  energy-converting  systems,  and  structural 
components  are  dependent  on  molecular  interactions  which  in  their  simplest  form  re¬ 
quire  interplay,  involving  affinity  relationships,  between  two  molecules  or  even  two 
separate  atomic  groupings  on  a  single  molecule.  These  interactions  can  occur  entirely 
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in  free  solution  but  more  often,  at  least  in  biological  systems,  molecular  interactions 
involve  the  constant  association  and  dissociation  of  solutes  with  membrane-  and/or 
particle-immobilized  ligands  or  acceptor  sites.  Simple  approximations  of  these  types 
of  solution/solid  phase  interfacial  recognition  events  may  be  evaluated  and  modelled 
in  vitro  by  chromatography  using  simple,  chemically  defined  ligands.  Conversely,  as 
outlined  here,  when  we  encounter  previously  unrecognized  interaction  phenomena 
during  chromatography  of  biomolecules  in  aqueous  solutions  there  is  some  chance 
that  we  will  find  them  also,  in  one  form  or  another,  in  biological  systems.  Our  pur¬ 
pose  here  is  to  present  one  specific  and  intriguing  example  of  a  larger  concept  utiliz¬ 
ing  chromatography  for  the  development  of  model  interfacial  ligand :  ligate  systems  to 
better  understand  the  individual  driving  forces  responsible  for  specific  solid-phase 
biomolecular  interactions. 

During  investigations  designed  to  prepare  a  more  desirable,  activated  solid  phase 
to  immobilize  ligands  useful  for  protein  fractionation  it  seems  as  if  we  have  dis¬ 
covered  a  kind  of  molecular  interaction  not  previously  described  or  utilized  [1],  The 
discovery  was  serendipitous.  Spherical  beads  of  agarose  (6%)  were  simultaneously 
crosslinked  and  activated  with  divinylsulfone  (DVS).  While  DVS  crosslinking  con¬ 
fers  rigidity  upon  the  gel  skeleton  and  improves  the  chromatographic  performance  of 
the  stationary  phase  at  elevated  pressures  (i.e.,  high-performance  liquid  chromatog¬ 
raphy,  “hplc”),  it  also  activates  the  gel  for  further  derivatization.  However,  when  the 
remaining  active  vinyl  groups  were  eliminated  using  the  simple  nucleophile  /3- 
mercaptoethanol  to  prevent  nucleophilic  components  in  the  mixture  to  be  fractionated 
from  themselves  becoming  permanently  (covalently)  attached  to  the  gel,  the 
“deactivated”  gel  product  so  obtained  revealed  a  selective  adsorption  property  unlike 
any  previously  described.  The  new  gel  derivative  has  tentatively  been  named  the 
T  gel  in  reference  to  the  Thioether  structure  of  the  immobilized  ligand.  Similarly, 
the  specific  adsorption  behavior  of  certain  peptides  and  proteins  toward  the  T  gel 
has  been  designated  “thiophilic”  to  recognize  the  affinity  of  interacting  molecules 
for  the  sulfur  groups  in  the  thioether-sulfone  ligand  [2, 3],  The  structure  of  the  T  gel 
is  schematically  provided  in  Figure  1.  We  believe  it  is  important  to  emphasize 
the  simple  and  nonionic  nature  of  the  interacting  ligand.  We  will  describe  some 
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Figure  1 .  Structure  of  T-gel  ligand. 
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unique  properties  of  thiophilic  adsorption  which,  not  unlike  hydrophobic  interaction 
(e.g.,  on  aliphatic  hydrocarbons),  is  salt-promoted  and,  in  part,  an  entropy-driven 
process  influenced  by  ordered  water  structure  at  the  protein-solvent  interface.  As 
a  prelude  we  should  like  to  briefly  review  the  utility  of  column  chromatography  for 
the  preliminary  experimental  identification  and  characterization  of  new  molecular 
interactions. 


Chromatography  and  Molecular  Adsorption 

Molecular  adsorption  in  gels,  like  the  more  complicated  biological  membranes, 
require  considerations  belonging  to  the  realm  of  solid-phase  chemistry  and  therefore 
is  set  apart  from  free  solution  in  several  respects.  It  is  important  to  understand  that  a 
solid  matrix,  especially  when  it  contains  an  interacting  species  (i.e.,  immobilized  li¬ 
gand),  restricts  diffusion  and  mobility  of  the  ligand-ligate  complex  (Fig.  2).  How¬ 
ever,  even  though  a  rough  approximation  it  remains  instructive  in  this  context  to 
consider  the  kinetics  and  thermodynamics  of  aqueous  solutions  starting  with  the  law 
of  mass  action: 

Ligand  +  Ligate  Ligand:Ligate  complex 

[Ligand:Ligate  complex]  _ 

[Ligand]  [Ligate] 


LIQAND  ACCUMULATES  AT  STATIONARY  PHASE  IN 
THE  PRESENCE  OF  WATER-STRUCTURING  SALT 


oloont  flow  velocity  vactora  ( f ) 


adsorption 
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with  structured  wator 


Figure  2.  Dynamics  of  immobilized  ligand-ligate  interaction  during  chromatography. 
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where  the  ligate  or  solute,  for  example,  may  be  a  protein  with  a  ligand-binding  site. 
The  equilibrium  formation  or  association  constant  (A^)  as  defined  above  is  related  to 
the  distribution  or  partition  coefficient  ( K )  of  the  interacting  species  in  the  stationary 
and  mobile  phases  of  a  chromatographic  bed  and  consequently  also  to  the  retardation 
of  the  ligate  on  its  passage  through  the  chromatographic  bed  of  immobilized  ligand 
(see  Giddings,  Ref.  4): 

R  _  [ligateL  Vn  _ 

1  -R  [ligate],  V,  KV, 

where  R  is  defined  as  the  fraction  of  ligate  in  the  mobile  phase  (m)  at  equilibrium  so 
that  1-/?  is  the  fraction  of  ligate  in  the  stationary  phase  (s),  that  is,  associated  with 
immobilized  ligand.  The  more  familiar  equation  of  Martin  and  Synge  [5]  is  obtained 
upon  solving  the  above  equation  for  R : 


R  = 


V„  +  KV, 


(3) 


Note  that  for  adsorption  chromatography,  the  volume  of  the  stationary  phase  (V,)  is 
probably  best  replaced  by  solute-accessible  or  effective  surface  area,  which  can 
depend  on  the  size  of  the  ligate. 

Indeed,  the  chromatographic  elution  volume  (V,)  of  any  given  ligate  (e.g.,  peptide 
or  protein)  is  related  to  the  physical  properties  of  the  adsorption  matrix  (e.g.,  agarose 
gel)  such  as  particle  size,  geometry,  and  porosity  as  well  as  its  essential  and  defini¬ 
tive  chemical  properties,  namely,  ligand  type  and  density.  In  practice,  however,  these 
parameters  act  collectively  to  help  determine  the  partition  coefficient  K  and  are  re¬ 
lated  to  the  elution  volume  according  to  the  following  expression: 


V,  =  +  KV, 


(4) 


or 


Equation  (5)  clearly  shows  the  necessity  to  carefully  define  reference  elution  volumes 
to  accurately  determine  the  extent  and  specificity  of  the  molecular  interaction  be¬ 
tween  ligand  and  ligate.  It  was  exactly  these  considerations  that  led  to  the  discovery 
of  thiophilic  adsorption.  To  illustrate,  consider  a  chromatographic  process  in  a  bed  of 
granular  solid  support  of  total  volume  V,  consisting  of  a  gel  made  up  of  a  molecular 
network  that  is  in  part  permeable  to  the  solutes  (ligates)  under  study  (Fig.  3).  If  a 
substance  cannot  permeate  the  granular  particles  it  travels  only  in  the  interstitial  space 
between  the  grains  (voids)  and  appears  in  the  column  eluate  after  passage  of  a  vol¬ 
ume  of  eluent  equivalent  to  the  volume  of  the  voids,  VB  In  other  words,  it  travels 
with  the  speed  of  liquid  eluent  and  V,  =  V„.  Solutes  are  usually  not  separated 
(K  =  O)  under  these  conditions  (“hydrodynamic”  separation  is  to  a  limited  extent 
possible).  If  there  is  no  molecular  interaction  taking  place  but  a  solute  permeates 
parts  of  the  internal  gel  regions  it  lags  behind  the  moving  front  of  the  liquid  eluent 
and  appears  in  the  column  eluate  after  some  characteristic  retention  volume  V, ,  but 
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INaM 


ELUTION  VOLUME 


Figure  3.  Reference  elution  volumes  necessary  for  characterizing  ligate  elution  behavior 

during  chromatography. 


before  the  total  volume  (i.e.,  V0  <  Vt  <  V,).  This  elution  volume  depends  on  the  in¬ 
terrelationship  in  molecular  size  between  the  solute  molecules  and  the  pores  of  the 
matrix.  The  process  is  called  molecular  sieve  or  size-exclusion  chromatography  and 
was  discovered  by  one  of  us  (Porath  {6]  together  with  Per  Flodin  about  30  years  ago) 
using  cross-linked  dextran  (Sephadex).  However,  it  is  essential  at  this  point  to  recog¬ 
nize  that  the  nature  of  the  solvent  system  or  eluent  itself  can  dramatically  influence 
solute  elution  volume.  This  was  found  to  be  particularly  true  of  thiophilic  adsorption, 
a  process  which  is  differentially  affected  by  the  type  and  concentration  of  salt  present 
in  solution.  If  a  water-structuring  salt  (e.g.,  sulfate  or  phosphate)  is  added  to  the  sol¬ 
vent  in  relatively  high  concentration,  solute  molecules  may  be  forced  out  of  the  bulk 
solution  and  accumulate  at  the  phase  boundary,  that  is,  adjacent  to  the  polymer 
chains  making  up  the  gel  network  of  immobilized  ligand.  This  salt-promoted  accu¬ 
mulation  delays  diffusion  of  the  solute  under  study  and  consequently  the  zone  con¬ 
taining  it  is  broadened  and  further  retained  compared  to  the  case  in  salt-free  solution. 
Under  such  circumstances,  the  solute  now  appears  after  an  elution  volume,  V^,  later 
than  V,  and  often  in  a  volume  exceeding  V,  (Fig.  3).  This  retardation  phenomenon, 
caused  by  the  salt,  can  be  referred  to  as  salt-promoted  adsorption.  It  is  apparently  far 
more  common  than  believed  earlier.  In  fact,  salt-promoted  adsorption  of  aromatics  to 
Sephadex  was  observed  [7]  already  around  the  time  of  the  discovery  of  molecular 
sieving  [6].  Salts  influencing  protein  adsorption  on  unchanged  amphiphilic  gels  can 
be  arranged  in  a  so-called  Hofmeister  [81  series  according  to  their  effects  on  both 
protein  and  water  structure: 

Phosphates  >  sulfates  >  acetates  >  chlorides  >  nitrates  >  thiocyanates 

Salts  on  the  right  side  of  this  scale  promote  protein  solubility  (salting-in)  and  can,  in 
fact,  promote  denaturation  due  in  part  to  their  disruptive  influence  on  bound  water 
structure.  They  are  thus  referred  to  as  chaotropic  [91 .  Salts  on  the  left  side  of  this 
scale  promote  protein  stability  and  precipitation  (salting-out)  and  are  said  to  have  the 
opposite  effect  on  water  structure.  To  our  knowledge,  the  mechanism(s)  of  salt- 
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promoted  retardation  or  adsorption  has  never  been  explained.  The  salt-promoted 
concentration  of  ligate  at  the  matrix  boundary  may  sufficiently  increase  the  equi¬ 
librium  association  constant  so  as  to  greatly  enhance  its  interaction  with  the  immobi¬ 
lized  ligand.  A  much  decreased  affinity  would  be  expected  in  the  absence  of 
water-structuring  (antichaotropic)  salts.  In  fact,  adsorption  capacity,  as  well  as 
adsorption  strength,  appears  to  increase  in  parallel  with  increasing  salt  concentration 
for  all  kinds  of  salt-promoted  adsorption.  If  the  salt  types  responsible  for  promoting 
ligate  adsorption  to  certain  solid  matrices  (e.g.,  T  gel)  are  acting  by  their  influence 
on  the  organization  of  water  at  or  near  the  surface  of  either  ligand  or  ligate  then  a 
consideration  of  entropy  is  important. 

Thermodynamic  Considerations 

The  distribution  coefficient  K  as  expressed  in  Eqs.  (3)  and  (4)  may  be  related  to 
the  Gibbs’  free  energy  of  interaction  by  the  equation: 

AG  =  -RT  In  K  (6) 

AG  may  also  be  expressed  as: 

AG  =  AW  -  T\S  (7) 

where  AW,  the  enthalpy,  accounts  for  the  energy  of  direct  interaction  between  lig¬ 
ands  and  ligates  and  AS,  the  entropy  term,  accounts  mainly  for  overall  changes  in 
water  structure  as  a  consequence  of  complex  formation.  The  relative  contributions  of 
AW  and  T  AS  toward  the  solubility  of  nonpolar  solutes  in  water  as  well  as  the  forma¬ 
tion  (stabilization)  of  hydrophobic  bonds  between  nonpolar  solutes  in  water  can 
provide  insight  into  the  contributions  of  water  structure  during  salt-promoted  ligate 
interactions  with  immobilized  ligands.  It  is  suggested  that  the  energy  required 
(AW  >  O)  to  solubilize  a  nonpolar  solute  is  more  than  offset  by  the  energy  gained 
(AW  <  O)  due  to  the  formation  of  new  hydrogen  bonds  at  the  solute-solvent  inter¬ 
face  [10].  The  association  of  two  or  more  nonpolar  groups  (or  any  groups,  e.g.,  lig¬ 
and:  ligate,  with  bound  or  ordered  water  molecules)  in  an  aqueous  environment 
would  then  be  expected  to  decrease  the  ordered  water  structure  at  their  interacting 
surfaces  and  result  in  an  increase  in  entropy.  The  AW  term  in  hydrophobic  interac¬ 
tions  is  small  due  to  the  presence  of  only  weak  dispersion  forces.  Thus,  generic 
“hydrophobic  interactions’’  are  thought  to  be  entropically  “driven”  (AG  ~  -TAS) 
and  favored  (AG  <  O)  with  increased  temperature.  A  simplistic  schematic  of  this 
process  is  shown  in  Figure  4.  However,  is  the  favorable  entropy  change  associated 
with  the  displacement  of  bound  water  a  major  factor  in  stabilizing  other  types  of 
protein-ligand  interactions?  What  contribution  does  ligand  structure  contribute  to 
this  process? 


Thiophilic  Adsorption 

Distinguishing  Structural  Requirements  of  the  Immobilized  Ligand 

Perhaps  a  reasonable  reference  point  from  which  to  distinguish  thiophilic  adsorp¬ 
tion  from  the  more  widely  observed  but  lesser  discriminating  hydrophobic  interac- 
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HYDROPHOBIC  INTERACTIONS: 


SOLUTE-BOUNO  WATER  AND  CHANGE  IN  ENTROPY 
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Figure  4.  A  simple  explanation  of  hydrophobic  interaction.  The  AH  is  small  because  it  is 
made  up  of  contributions  due  to  weak  dispersion  forces.  The  TAS  term  becomes  large  due 
to  disordering  of  the  expelled  water.  These  terms  therefore  give  AG  <  O  and  constitute  a 
"driving  force”  for  complex  action. 


lions  is  to  consider  the  structure  of  the  interacting  ligands  involved.  Although  widely 
varied  in  structure,  representative  ligands  for  the  generic  hydrophobic  adsorption 
of  peptides  and  proteins  to  gels  with  a  hydrophilic  network  structure  include 
C4,C6,  C8.  .  .  Cl8  linear  aliphatic  hydrocarbon  chains  (e.g..  Ref.  11).  Now  let  us 
examine  the  structure  of  the  T  gel  and  compare  it  to  several  other  thiophilic  as  well  as 
nonthiophilic  ligands  recently  synthesized.  The  following  immobilized  ligands  have 
all  been  synthesized  and  are  at  various  stages  of  evaluation  using  chromatography  in 
an  effort  to  determine  the  precise  contribution  of  individual  atoms  toward  the  thio¬ 
philic  adsorption  process: 


T  gel  Agarose-0-CH2-CH2-S02-CH2-CH2-S-CH2-CH2-0H 
Aliphatic  T  gel  derivatives  with  thiophilic  properties: 

Agarose-0 -CH2-CH2-S02-CH2-CH2-S-CH2-CH2-SH 
A  gel  Agarose-0  -CH2-CH2-S02-CH2-CH2-NH2 
N  gel  Agarose-0  -  CH2  -  CH2  -  S02  -  CH 2  -  CH  2  -  NH  -  CH2  -  CH2  -  OH 

Agarose-0 -CH2-CH2-S02-CH2-CH2-NH-CH2- COOH 
AN  gel  Agarose-0-CH2-CH2-S02-CH2-CH2-NH-CH2-C  =  N 

ch2-c=n 

PTNgel  Agarose-0  -  CH2  -  CH2  -  S02  -  CH2-  CH2  -  NH  -  .C=N 

\ 

XC=N 


7 
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ch2-c=ch 

DPA  gel  Agarose-0 -CH2-CH2-S02-CH2-CH2-N(^ 

NCH2-ChCH 

Other  T  gel  derivatives  contain  aromatic  substitutions  and  are  referred  to  as  -S-7T 
adsorbents: 

— CH2— CH2— SO,— CH2— CH2— S  — R 


e.g.. 


—  S — triazole  Tr-gel 

—  S  —  methoxyl  phenyl  M-gel 

—  S — pyridine  P-gel 

These  -S-n  gels  exhibit  distinctive  adsorption  properties  which  are  discussed  else¬ 
where  [12, 13]. 

The  T-gel  ligand,  although  uncharged,  appears  to  be  the  most  effective  one  in 
terms  of  strength  and  capacity.  Still  another  hydroxyl  ( — OH)  or  even  a  thiol  ( — SH) 
can  be  introduced  in  the  T-gel  ligand  with  characteristic  T-gel  adsorption  capacity  es¬ 
sentially  retained  [1], 

— O — CH2 — CH2 — S02 — CH, — CH, — S — CH2 — CHOH — CH2 — OH 
— O — CH, — CH2 — S02 — CH2 — CH2 — S — CH, — CHSH — CH, — OH 
Further,  the  ligand: 

— O — CH2 — CH2 — S02 — CH2 — CH2, —  S — CH2 — C  H —  CH — CH, — SH 

II 

OH  OH 

has  a  similar  adsorption  capacity  [1],  Thus  it  seems  that  thiophilic  interaction,  from 
the  view  of  the  ligand,  should  in  fact  be  considered  to  be  hydrophilic  rather  than 
hydrophobic.  Like  the  T  gel,  the  A  gel  and  DPA  gel  (J.  Porath,  in  preparation)  ad¬ 
sorb  immunoglobulins  and  a2-macroglobulin  from  human  serum.  The  A  gel  is  more 
hydrophilic  than  hydrophobic  and  the  hydrophobicitiy  of  the  DPA  gel  is  not  strong 
enough  to  make  it  an  adsorbent  for  serum  albumin. 

A  ligand  with  only  thioether  sulfur,  as  obtained  by  coupling  mercaptoethanol  to 
oxirane-agarose 

— CH2— CHOH— CH2— S— CH2— CH2  —  OH 

does  not  convert  agarose  to  a  protein  adsorbent,  neither  will  a  ligand  containing  only 
a  sulfone  group  in  an  aliphatic  surrounding  be  thiophilic.  In  fact  the  ligand  shown  be¬ 
low  appeared  to  be  relatively  inert  as  protein  adsorbent  [1], 

— O — CH2 — CH2 — S02 — CH2 — CH2 — OH 

Similarly,  the  following  structure  (derived  from  oxirane-activated  agarose): 
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— CH2 — CHOH — CH2 — O — CH2 — CH2 — OH 

although  similar  to  the  T-gel  ligand  does  not  have  a  sulfone  group  and  was  found  not 
to  have  thiophilic  (or  any  other)  protein  adsorption  property  under  the  conditions 
evaluated  [14].  Further,  when  the  propenetrinitrilo  (PTN)  group  is  attached  via  an 
oxirane  coupler  ( — O — CH,  CHOH  —  CH2 — PTN)  instead  of  the  divinylsulfone 
coupler  ( — S02 — CH2 — CH2 — PTN),  little  or  no  protein  adsorption  occurs  [14]. 
The  thioether  sulfur  and  the  sulfone  group  likely  cooperate  when  interacting  with  the 
aromatic  side  chains  in  the  proteins.  It  currently  appears  as  though  a  necessary  and 
probably  sufficient  structural  feature  for  thiophilicity  is  contained  in  the  sequence: 
— S02 — CH2 — CH2 — X  —  where,  in  the  absence  of  aromatic  substitutions,  X  can 
be  any  atom  with  a  lone  electron  pair. 

How  can  this  structure  so  specifically  interact  with  the  surface  of  certain  proteins? 
Is  the  acceptor  site  a  7r-electron  system  such  as  the  indole  nucleus  of  a  tryptophan 
residue  located  in  an  accessible  cavity  or  at  the  surface  of  the  protein  molecule?  We 
believe  that  some  kind  of  ring  structure  may  be  formed  by  transfer  of  electrons  or 
protons  between  the  ligand  site  (“S  site”)  and  the  corresponding  countersite  on  the 
protein.  This  electron-donor-acceptor  relationship  is  schematically  depicted  in  Fig¬ 
ure  5. 


Figure  5,  Possible  electron-donor-acceptor  thiophilic  interaction  mechanism. 
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Properties  that  Distinguish  Thiophilic  Adsorption  from  Generic 
Hydrophobic  Interactions 

We  must  recognize  that  as  we  create  ligands  increasingly  specific  for  a  certain  ar¬ 
chitecture  or  binding  site(s)  on  the  surface  of  proteins  it  necessarily  becomes  increas¬ 
ingly  difficult  to  categorize  these  proteins  into  groups  according  to  their  specific 
adsorption  behavior  (as  is  the  case,  for  example,  in  anion-  or  cation-exchange  chro¬ 
matography).  However,  because  new  types  of  protein  adsorption  phenomena  will  in¬ 
evitably  be  discovered  and  exploited,  attempts  should  be  made  to  formulate  these 
observations  into  mechanisms  which  recognize  and  successfully  challenge  existing 
boundaries  of  group  fractionation  theory.  We  find  that  proteins  often  used  as  model 
standards  to  characterize  the  adsorption  behavior  of  hydrophobic  gels  (e  g.,  human 
and  bovine  serum  albumins,  ovalbumin,  myoglobin,  ribonuclease  A,  and  cytochrome 
c)  do  not  interact  with  the  T  gel  under  conditions  where  immunoglobulins  are  effec¬ 
tively  adsorbed  [2],  Similarly,  under  identical  conditions,  alcohol  dehydrogenases  1 
and  2  (from  Zymomonas  mobilis)  passed  through  the  T  gel  unadsorbed  but  were 
strongly  adsorbed  to  either  C4  or  C8  hydrophobic  gels  attached  tandemly  1 15). 

So,  we  now  review  arguments  in  favor  of  the  view  that  thiophilic  interaction  is  dif¬ 
ferent  in  nature  from  hydrophobic  interaction  as  is  exerted  between  two  alkyl  chains 
in  aqueous  solution.  The  hydrophobic  gel  (H  gel)  referred  to  in  the  following,  con¬ 
tains  an  octyl  (C„)  ligand  linked  via  1 ,4-butanedioI  diglycidyl  ether  to  the  matrix  in  a 
gel  consisting  of  6%  agarose  (16J. 

Consider  two  chromatographic  beds  connected  in  tandem,  the  upper  one  containing 
T  gel,  the  lower  one  H  gel  (1).  The  gels  were  equilibrated  with  a  0.05  M  Tris-HCI 
buffer  of  pH  7.5  containing  0.5  M  potassium  sulfate,  the  latter  being  the  adsorption- 
promoting,  water-structuring  salt.  A  sample  of  human  serum,  likewise  equilibrated 
with  the  sulfate-containing  buffer,  was  introduced  at  the  top  of  the  T-gel  column  fol¬ 
lowed  by  washing  with  sulfate-containing  buffer.  Some  proteins  were  adsorbed  in  the 
T-gel  column,  other  proteins  in  the  H-gel  column.  The  remaining  proteins  were 
eluted  unretained  by  either  column.  After  disconnecting  the  composite  T  — *  H 
column  the  proteins  were  desorbed  separately  from  each  bed  simply  by  eluting  with 
sulfate-free  buffer.  Gel  electrophoretic  analysis  of  the  two  separate  fractions  revealed 
that  the  H  gel,  as  expected,  had  depleted  the  serum  with  respect  to  albumin.  The  T 
gel,  surprisingly,  had  selectively  removed  the  immunoglobulins  and  a;-macroglobu- 
lin.  Reversing  the  tandem  order  of  the  beds  (i.e.,  H  — >  T)  gave  identical  results  thus 
proving  the  difference  in  kind  of  interaction.  The  H  gel  and  T  gel  also  respond  differ¬ 
ently  in  protein  adsorption  behavior  as  a  function  of  salt  type  used. 

Potassium  sulfate  is  just  one  of  several  simple  salts  that  can  be  used  to  increase  ad¬ 
sorption  on  both  the  T  gel  and  H  gel.  Shortly  after  its  first  description,  hydrophobic 
adsorption  to  an  uncharged  gel  was  shown  to  be  promoted  by  chlorides  [17]  and. 
later,  by  phosphates  and  sulfates.  In  contrast,  thiophilic  adsorption  is  effectively  pre¬ 
vented  by  sodium  chloride.  In  fact,  0.5  M  sodium  chloride  has  been  used  to  further 
increase  the  specificity  of  the  T  gel  for  immunoglobulins  by  preventing  adsorption  of 
or2-tnacroglobulin  [2].  But  even  immunoglobulins  are  not  adsorbed  well  if  high 
enough  concentrations  of  sodium  chloride  are  present  [18].  Conversely,  sodium  chlo¬ 
ride  is  quite  effective  at  promoting  adsorption  of  serum  albumins  to  hydrophobic  ad- 
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sorbents  [11, 18].  Thus,  the  division  line  between  those  salts  promoting  adsorption 
and  those  promoting  desorption  is  located  differently  in  the  Hofmeister  series  for  H 
and  T  gels,  and  we  have  another  argument  in  favor  of  our  proposal  that  H  and  T  in¬ 
teractions  are  distinctly  different  in  nature. 

We  can  next  compare  the  effects  of  temperature  on  ligate  adsorption  by  the  H  and 
T  gels  to  help  distinguish  the  two  adsorption  mechanisms  involved.  We  have  indi¬ 
cated  in  our  discussion  of  Eq.  (7)  why  hydrophobic  interactions  increase  with  tem¬ 
perature.  This  appears  to  have  been  verified  experimentally  [e.g..  Refs.  10, 11]. 
However,  immunoglobulins  are  more  strongly  adsorbed  to  the  T  gel  with  lower  tem¬ 
perature  [1],  What  about  the  simpler  model  substances?  The  tripeptide  Leu  Leu  Leu 
is  rather  hydrophobic  and  was  not  very  strongly  adsorbed  to  the  T  gel.  Nevertheless, 
adsorption  on  the  T  gel  became  even  weaker  and  approached  zero  with  increasing 
temperature.  The  opposite  temperature  effect  was  observed  (as  expected)  for  the  H 
gel  and  may  provide  a  basis  for  our  third  argument  in  favor  of  the  difference  in  inter¬ 
action.  An  interesting  and  surprising  set  of  results  were  obtained  with  the  dipeptide 
TrpTrp.  TrpTrp  showed  very  strong  thiophilic  and  hydrophobic  adsorption  properties, 
and  its  interaction  with  both  the  T  gel  and  H  gel  was  quite  significantly  decreased 
with  increasing  temperature.  The  temperature-dependent  behavior  of  TrpTrp  on  the 
H  gel  was  quite  contrary  to  that  expected  based  upon  information  often  presented  in 
the  literature.  More  work  is  underway  to  better  evaluate  temperature  effects  on  the 
adsorption  behavior  of  model  (structurally  defined)  ligates. 

Finally,  by  constructing  a  ligand  composed  of  two  segments  or  interactive  sites, 
one  common  to  the  H  ligand  and  the  other  to  the  T  ligand,  a  superposition  of  both 
interaction  effects  is  obtained.  Thus,  coupling  of  octanethiol  to  a  divinylsulfone- 
activated  matrix  produces  a  gel  which  adsorbs  albumin  as  well  as  immunoglobulin 
and  in  both  cases  adsorption  is  very  much  stronger  than  on  gels  with  the  simpler 
composed  ligands.  This  observation  also  argues  against  H  and  T  interactions  being 
identical  in  nature.  Hie  superposition  principle  shows  that  the  H  and  T  interactions 
do  not  exclude,  but  rather  reinforce  each  other  although  both  may  involve  related 
target  or  countersites  on  the  protein  surface.  In  passing,  it  may  be  worth  mentioning 
that  the  use  of  double-  or  triple-affinity  principles  seems  to  offer  a  new  unexplored 
road  to  immobilize,  on  solid  support,  proteins  and  protein-containing  aggregates  or 
biological  systems  of  higher  order  such  as  viruses  and  cells. 

Proposed  Thiophilic  Interaction  Mechanisms 

Let  us  now  turn  to  the  problem  of  how  to  identify  the  thiophilic  interaction  site(s)  on 
proteins  or  other  ligates.  Amino  acids  are  too  weakly  adsorbed  to  give  conclusive  re¬ 
sults.  To  intensify  die  interaction  we  tested  some  very  simple  homopeptide  structures. 
The  involvement  of  aromatic  amino  acids  during  peptide  adsorption  was  hereby  sug¬ 
gested.  When  we  compared  the  results  of  homopeptide  adsorption  evaluations  (rela¬ 
tive  partition  coefficient)  on  both  the  H  gel  and  T  gel,  we  found  the  following  series 
of  increasing  adsorption  strength: 

H:  immunoglobulins  «-Tyr-  <  -Phe-  <  -Trp-<?  serum  albumin 

T:  serum  albumin  <f-Tyr-  <  -Phe-  <  -Trp-«  immunoglobulins 
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The  relative  elution  order  among  the  homopeptides  analyzed  was  different  from 
that  found  earlier  for  Sephadex  (and  agarose): 

-Phe-  <  -Tyr-  <  -Trp- 

where  in  this  latter  case,  the  interaction  is  much  weaker.  We  are  continuing  to  inves¬ 
tigate  the  thiophilic  behavior  of  synthetic  model  peptides  as  well  as  naturally  occur¬ 
ring  peptides  of  various  sizes.  While  it  may  continue  to  be  instructive  to  evaluate 
structurally  defined  peptide  interactions  with  the  T  gel  under  varied  experimental 
conditions,  interpretation  of  the  results  may  not  be  entirely  relevant  to  the  mecha¬ 
nism^)  by  which  thiophilic  proteins  are  adsorbed. 

The  identification  of  specific  T-gel  ligand  acceptor  sites  on  the  surface  of  a  thio¬ 
philic  protein  is  much  more  complicated.  To  promote  further  discussions.  Figure  6  il¬ 
lustrates,  schematically,  the  interaction  of  ligand  and  counterligand  in  a  hydrophobic 
pocket  at  the  protein  surface.  Although  thiophilicity  may  be  exerted  without  interven¬ 
ing  antichaotropic  salts,  the  latter  effectively  forces  the  interacting  species  (T-gel  lig¬ 
and  and  its  acceptor  site)  into  close  proximity  which,  in  addition  to  the  lower 
dielectric  constant  within  the  pocket,  accounts  for  a  considerable  reinforcement  in  the 
interaction.  This  situation  may  resemble  steroid  hormone-receptor  association  and 
certain  antigen-antibody  complex  formations  as  well  as  other  forms  of  biologically 
significant  complexation  phenomena. 

In  an  effort  to  better  understand  the  thiophilic  adsorption  process,  the  ligate  or 
protein-acceptor  site  needs  to  be  quantitatively  characterized  in  terms  of  its  interac¬ 
tion  with  the  T-gel  ligand.  Toward  this  end,  we  have  selected  as  models  a  variety  of 
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physically  well-characterized  proteins.  The  three-dimensional  structures  of  some  of 
these  proteins  have  been  solved  by  crystallization  and  X-ray  diffraction. 

The  relative  thiophilicity  of  these  proteins  was  first  evaluated  using  the  T-gel  and 
buffer  conditions  (specifically  10%  (0.76  M )  ammonium  sulfate  in  20  rruVf  HEPES 
buffer  pH  7.5)  consistent  with  the  selective  yet  maximal  adsorption  of  immunoglobu¬ 
lins  from  unfractionated  serum  [2, 3]  or  cell  culture  medium.  Under  these  experimen¬ 
tal  conditions,  the  model  proteins  could  be  arbitrarily  classified  as  either  thiophilic  or 
nonthiophilic  based  upon  percent  of  total  protein  adsorbed  by  a  given  bed  volume  of 
T  gel  (Table  I). 

Adsorption  of  the  thiophilic  proteins  was  then  found  to  be  influenced  by  the  type 
and  concentration  of  water  structure-forming  salt  as  well  as  pH,  with  an  adsorption 
minima  at  pH  5-6  [2, 3]. 

We  have  since  evaluated  means  of  experimentally  altering  (increasing)  the  apparent 
thiophilicity  of  certain  model  proteins  to  better  understand  the  relationship  between 
column  capacity  and  selectivity.  Accordingly,  the  affinity  of  several  model  proteins 
for  the  T  gel  was  evaluated  under  equilibrium  conditions  (batch  incubations)  with  in¬ 
creasing  concentrations  (up  to  20%)  of  water  structure-forming  salt  (ammonium  sul¬ 
fate).  The  data  were  evaluated  according  to  the  method  of  Scatchard  [19] 

n  j/c  _1  /ri,  ,  5 max 
Bound/Free  =  — —  (B)  +  -- 


Table  I.  Comparative  adsorption  of  model  proteins  to  the  T  gel. 


Model  proteins 

Percent  adsorption  to  T  gel 

I.  Thiophilic 

immunoglobulins  (human  serum) 

98 

lentil  lectin  (lens  culinaris) 

81 

carboxypeptidase  A  (bovine  pancreas) 

80 

trypsin  (bovine  pancreas) 

50 

trypsin  inhibitor  (Kunitz.  soybean) 

19 

trypsin-soybean  trypsin  inhibitor 

97 

chymotrypsin  (bovine  pancreas) 

18 

II.  Nonthiophilic 

ovalbumin  (chicken  egg  white) 

0 

ribonuclease  A  and  S  (bovine  pancreas) 

0-1 

cytochrome  c  (horse  and  tuna  heart) 

0 

myoglobin  (sperm  whale) 

0-4 

carbonic  anhydrase  I  (human) 

2-3 

pancreatic  trypsin  inhibitor  (bovine  pancreas) 

4-6 

albumins  (human  and  bovine  serum) 

4-5 

a  chymotrypsinogen  A  (bovine  pancreas) 

6-7 

Individual  proteins  (1  mg/2  mL)  were  applied  to  identical  T-gel  columns  (0.5  mL)  in 
column  equilibration  buffer  consisting  of  10%  ammonium  sulfate  in  20  m M  HEPES.  pH 
7.5.  Unbound  proteins  were  washed  through  with  4-6  mL  of  column  equilibration 
buffer.  Bound  proteins  were  eluted  with  20  mM  HEPES  buffer.  pH  7.5.  Data  are  pre¬ 
sented  as  percent  of  total  protein  adsorbed.  An  arbitrary  cut  off  at  10%  was  chosen  to 
distinguish  thiophilic  from  nonthiophilic  proteins  under  the  conditions  defined. 
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to  determine  the  equilibrium  dissociation  constant  (Kd)  and  the  protein-binding  ca¬ 
pacity  (B  max)  per  unit  T-gel  ligand.  Representative  data  are  shown  in  Figure  7. 
The  Scatchard  analyses  suggest  that  certain  proteins  (e.g.,  immunoglobulins  and 
lysozyme)  may  be  interacting  with  the  T-gel  ligand  via  more  than  one  ligand-binding 
site  with  varying  but  relatively  high  affinities  (Kd  =  1-8  x  10'6).  The  higher  affin¬ 
ity  interaction  (slope  shown)  has  been  corrected  for  the  influence  of  an  apparent  sec¬ 
ond,  lower  affinity  (8  x  10 ~6M)  interaction  site  by  the  method  of  Rosenthal  [20], 
The  additional  possibility  of  multiple  interaction  sites  cannot  be  excluded  at  this  time. 
Other  proteins  (e.g. ,  a  chymotrypsin)  appear  to  be  interacting  through  one  or  a  single 
class  of  lower  affinity  binding  sites  (Kd  =  400  x  10“6M).  Interestingly,  some 
proteins  (e.g.,  cytochrome  c  and  myoglobin)  show  little  or  no  thiophilic  behavior 
(K  =  0)  in  even  20%  ammonium  sulfate.  These  and  similar  data  were  compared  to 
the  observed  partition  coefficient  [calculated  from  V,  using  Eq.  (5)]  of  these  same 


Figure  7.  Scatchard  analyses  of  protein  interaction(s)  with  the  T-gel  thioether-sulfone  lig¬ 
and.  Increasing  concentrations  (10  points)  of  purified  protein  was  incubated  at  constant  vol¬ 
ume  and  temperature  (22°C)  with  a  known  quantity  of  T-gel  until  equilibrium  was  attained 
(<30  min).  Gel-bound  protein  was  calculated  from  the  difference  of  total  protein  added  and 
free  protein  at  equilibrium  as  determined  by  absorbance. 
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proteins  during  (1)  zonal  elution  chromatography  under  identical,  isocratic  conditions 
and  (2)  gradient  elution  chromatography  using  inverse  (e.g.,  20 %  — *  0%)  salt  gradi¬ 
ents.  The  relative  elution  order  of  the  mode!  proteins  evaluated  is  shown  in  Table  II. 
It  would  appear  from  preliminary  data  that  the  equilibrium  Kd  values  for  the  thio- 
philic  proteins  in  this  series  can  be  correlated  with  their  measured  partition  coeffi¬ 
cients.  However,  more  data  need  to  be  obtained  under  a  variety  of  experimental  con¬ 
ditions  before  a  satisfactory  quantitative  relationship  is  derived.  So,  an  increase  in 
salt  concentration  or  “driving  force”  can  promote  the  binding  of  additional  proteins  to 
the  T  gel,  however  such  an  apparent  relaxation  in  selectivity  may  be  a  function  of 
perturbed  water/protein  structure  and/or  an  additional  (new),  but  extremely  weak, 
affinity  property  of  these  proteins  for  the  ligand.  Details  of  this  investigation  are  to  be 
presented  elsewhere  [21). 

Finally,  the  equilibrium  affinity  constant  (Kd)  and/or  partition  coefficient  ( K )  of 
,.)ese  proteins  for  the  T  gel  is  being  examined  for  possible  correlation  with  type  and 
accessibility  (available  surface  area)  of  individual  amino  acids  at  the  protein’s  sur¬ 
face.  Factors  including  the  overall  density  and  relative  proximity  of  both  charged  as 


Figure  7.  (Continued.) 
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Table  II.  Thiophlic  adsorption:  relative  elution  behavior  of  model  proteins. 


Protein  and  gradient  elution  order 

Calculated  relative 
elution  volume  (mL)' 

1 .  Cytochrome  c 

1 

2.  Myoglobin 

2.1 

3.  Ribonuciease  S 

3.2 

4.  Bovine  serum  albumin 

3.3 

5.  Pancreatic  trypsin  inhibitor 

3.5 

6.  Ribonuciease  A 

3.8 

7.  Human  serum  albumin 

4.0 

8.  Carbonic  anhydrase  I 

5.0 

9.  x  Chymotrypsinogen  A 

6.7 

10.  Ribonuciease  S-protein 

7.5 

1 1 .  Soybean  trypsin  inhibitor 

8.3 

12.  x  Chymotrypsinogen  A 

9.0 

13.  Lysozyme 

9.7 

14.  Immunoglobulins  (gamma) 

11.0 

15.  Insulin 

11.2 

16.  Carboxypeptidase  A 

11.5 

elution  volume  of  protein 

Relative  elution  volume  =  — - . 

elution  volume  of  cytochrome  c 

Individual  proteins  were  loaded  onto  a  2  mL  (I  x  2.5  cm)  T-gel  column  equilibrated 
with  20 %  (1.5  M)  ammonium  sulfate.  0.5  M  sodium  chloride  and  20  m M  HEPES  pH 
7.5.  Elution  of  bound  proteins  was  initiated  during  development  of  a  reverse  ammonium 
sulfate  gradient  (20%  — *  0%).  Relative  elution  order  was  monitored  by  absorbance  at 
280  nm. 

well  as  nonpolar  and  aromatic  surface  patches  are  being  considered.  There  does  not 
appear,  as  yet,  to  be  any  one  overall  structural  feature  (e.g.  net  surface  charge  or 
isoelectric  point,  molecular  mass,  subunit  structure,  disulfide  linkages,  carbohydrate, 
etc.)  or  specific  surface  feature  which  is  common  to  the  thiophilic  or,  conversely,  the 
apparently  nonthiophilic  proteins  investigated. 

Thiophilic  Adsorption  in  Immunochemistry  and  Biotechnology 

Finally,  it  is  important  to  illustrate  just  how  selective  thiophilic  interaction  can  be. 
The  T  gel  can  be  utilized  alone  or  in  a  tandem  column  arrangement  with  other  gels  to 
accomplish  highly  selective  fractionations  of  complex  protein  mixtures  when  the  aim 
is  the  isolation  of  precious  bioactive  materials.  We  have  already  reported  the  selective 
immobilization  of  human  serum  immunoglobulins  (2).  It  is  also  possible  to  isolate 
monoclonal  antibodies  from  ascites  fluid  and  hybridoma  cell  culture  media  in  one 
rapid  step  using  mild  conditions  consistent  with  preserved  antibody  function. 

The  first  example  (Fig.  8)  illustrates  fractionation  of  hybridoma  cell  culture  media 
containing  a  monoclonal  antibody.  Only  a  relatively  small  fraction  of  the  proteins 
were  adsorbed  to  the  T  gel.  Upon  removal  of  the  water-structuring  salt  from  the 
column  eluent  solution  the  major  portion  of  the  proteins  adsorbed  came  out  in  a  sharp 
elution  zone.  Upon  further  washing  (or  upon  inclusion  of  NaCl),  a  second  elution  re- 
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TmORHlLIC  ADSORB  ION  AND  PURIFICATION  OF  IMMUNOGLOBULINS 


O.OS  AUFS  monoclonal  iMIkatflii  la  hybrlPoma  call  cwttura  fl«l« 


tLUTtO*  FOLUM* - - » 

Figure  8.  T-gel  elution  profile  of  monoclonal  antibodies  produced  during  hybridoma  cell 
culture.  Pooled  fractions  were  analyzed  for  purity  as  shown  in  Figure  9. 


gion  well  separated  from  the  first  one,  appeared  in  the  eluate.  This  zone  contained 
most  of  the  functional  antibody  present  in  the  original  sample.  It  was  an  almost  pure 
antibody  as  determined  by  analytical  gel  electrophoresis  under  native  and  denaturing 
conditions  (Fig.  9).  In  this  way,  by  a  single  operation,  nearly  complete  purification 
can  routinely  be  achieved  under  mild  conditions  with  typical  recoveries  of  over  90%. 
This  technique,  simple  as  it  is,  represents  a  drastic  improvement  over  existing  meth¬ 
ods  for  isolation  of  monoclonal  antibodies  and  is  applicable  on  any  scale,  at  least  in 
principle.  The  important  point  of  this  illustration,  however,  is  not  simply  one  of  im¬ 
munoglobulin  purification,  but  rather,  selective  recognition  and  immobilization  of  a 
protein.  The  unanswered  question  remains  the  molecular  mechanism  by  which  a  sim¬ 
ple,  chemically  defined  ligand,  such  as  that  operative  during  thiophilic  adsorption, 
can  recognize  a  protein  surface  with  such  specificity. 

A  significant  problem  associated  with  monoclonal  antibody  purification  from 
serum-dependent  hybridoma  cell  culture  media  has  been  removal  of  nonspecific  or 
background  (typically  bovine)  immunoglobulins.  In  efforts  to  circumvent  this  prob¬ 
lem  and  to  further  address  immunoglobulin-related  difficulties  with  serum-dependent 
cell  cultures  of  all  types,  we  have  evaluated  the  use  of  thiophilic  adsorption  to  selec¬ 
tively  eliminate  immunoglobulins  from  bovine  and  equine  sera.  Selective  removal  of 
immunoglobulins  present  in  bovine  serum  has  been  accomplished.  Bovine  sera  with 
immunoglobulin  levels  of  10-12  mg/mL  prior  to  thiophilic  adsorption  had  residual 
levels  at  or  below  1-5  ng/mL  after  a  single  thiophilic  adsorption  step.  The  calf 
serum  thus  purified  was  used  in  hybridoma  cell  culture  media  to  evaluate  cell  growth 
rates  and  production  of  monoclonal  antibodies.  These  properties  appeared  unaltered 
using  culture  media  prepared  with  the  purified  versus  untreated  fetal  calf  serum  (22). 
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Figure  9.  Electrophoretic  purity  of  monoclonal  antibodies  purified  by  thiophilic  adsorption 
as  described  for  Figure  8.  Panel  A  represents  pooled  fractions  analyzed  by  gradient  gel  elec¬ 
trophoresis  under  nondenaturing  conditions  and  panel  B  represents  the  same  fractions  ana¬ 
lyzed  by  sodium  dodecylsulfate  polyacrylamide  gel  electrophoresis.  Proteins  were  localized 
by  Coomassie  blue  dye  and  silver  staining.  Experimental  details  are  provided  elsewhere  [2J. 

Monoclonal  antibodies  present  in  the  culture  media  were  subsequently  purified 
to  near  homogeneity  in  one  step  using  thiophilic  adsorption  chromatography  as 
described  for  Figure  8.  We  are  also  investigating  the  use  of  T-gel-purified  bovine 
sera  for  the  more  efficient  growth  of  certain  viruses  in  culture.  We  therefore  consider 
thiophilic  adsorption  to  be  a  major  advancement  in  the  field  of  biotechnology  and  be¬ 
lieve  that  more  detailed  analyses  of  the  precise  interaction  mechanisms  involved  may 
lead  to  a  more  direct  approach  in  our  search  for  group-selective  protein  adsorbents. 
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Abstract 

The  Hel  photoelectron  (PE)  spectra  of  the  drugs  antipyrine  (phenazone,  1),  aminopyrine  (amidopyrine, 
2).  phenylbutazone  Q),  and  indomethacin  (1)  are  reported.  These  drags  possess  analgesic-antipyretic  and 
anti-inflammatory  activity,  and  have  been  in  clinical  use  for  many  years.  Their  electronic  structure,  which 
is  important  to  their  biological  activity,  is  determined  by  the  application  of  composite  molecule  methods  to 
an  analysis  of  the  PE  spectra. 


Introduction 

The  relief  of  pain  has  always  been  a  medical  priority.  The  chemical  (pharmacologi¬ 
cal)  approach  to  this  objective  was  (and  is)  to  discover  substances  that  provide  such 
relief.  As  a  result,  numerous  agents  have  been  tested  and  introduced  into  medicine. 
Unfortunately,  most  of  them  are  addictive,  that  is,  most  of  the  older  agents  are  nar¬ 
cotics.  During  the  last  century,  however,  numerous  non-narcotic  mild  analgesics  have 
been  discovered,  the  analgesic  property  often  being  accompanied  by  antipyretic  and 
anti-inflammatory  activity.  The  most  prominent  representatives  of  these  latter  agents 
belong  to  two  classes  of  compounds:  the  salicylates  and  the  p-aminophenols. 

In  a  recent  paper  [1],  we  used  the  experimental  technique  of  photoelectron  (PE) 
spectroscopy  and  the  theoretical  technique  of  the  composite  molecule  method  (cmm) 
[2]  to  investigate  the  electronic  structure  of  several  compounds  belonging  to,  or 
related  to  these  two  classes.  The  accent  in  the  investigation  was  placed  on  elucidation 
of  the  interactions  of  an  amidic  group  with  an  attached  phenyl,  the  phenyl  substituent 
being  appended  either  at  the  amidic  carbon  or  nitrogen  centers.  It  was  found  that  a 
substantial  difference  of  electronic  structure  is  associated  with  the  two  amidic  sites: 
the  benzamides  resemble  the  benzoic  acid  derivatives  [3]  whereas  th  tnilides  resem¬ 
ble  the  anilines.  The  purpose  of  the  present  work  is  to  extend  these  investigations  to 
some  more  complicated  analgesic-antipyretic  and  anti-inflammatory  agents,  namely 
antipyrine  (1J,  aminopyrine  (2),  phenylbutazone  (3),  and  indomethacin  (4),  which 
last  belongs  formally  to  both  groups  [4],  Chemically  speaking,  1-3  are  pyrazolones 
and  indomethacin  is  an  indole.  However,  the  chemical  structures  reveal  the  existence 
of  a  common  part  5  which,  with  X  =  H,  Y  =  Me,  and  Z  =  OH  or  OEt,  is  represen¬ 
tative  of  the  well-known  analgesics  paracetamol  or  phenacetine,  respectively.  Since 
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we  had  already  investigated  the  electronic  structure  of  paracetamol  and  phenacetine 
[1]  as  well  as  that  of  several  pyrazolones  [S]  and  indoles  [6],  it  seemed  appropriate  to 
ascertain  if  different  X,  Y,  and  Z  substituents  introduced  substantial  changes  of  elec¬ 
tronic  structure  in  the  compounds  1.-4,  that  might  affect  biological  activities  and 
modes  of  action.  Namely,  the  electronic  structure  of  drugs  as  manifested  either 
through  their  ionization  potentials  and/or  calculated  orbital  energies  has  been  shown 
repeatedly  to  correlate  with  biological  activity  [7—91. 

Of  the  above  compounds,  1.  and  2  are  the  oldest,  having  been  introduced  into 
medicine  at  the  end  of  the  1800s.  They  found  wide  use  as  antipyretics,  analgesics, 
and  anti-inflammatory  agents.  However,  since  compound  2  produces  agranulocytosis 
(bone  marrow  toxicity)  both  drugs  have  disappeared  from  the  market. 

Compound  3,  introduced  in  1949  for  treatment  of  rheumatoid  arthritis  and  related 
disorders,  is  an  effective  anti-inflammatory  agent  whose  use  is  limited  by  toxicity.  Its 
therapeutic  effects  are  similar  to  the  salicylates  but,  because  of  its  toxicity,  it  should 
not  be  used  routinely  as  either  analgesic  or  antipyretic.  It  undergoes  extensive  meta¬ 
bolic  transformation  in  humans,  the  most  significant  primary  reactions  involving 
glucuronidation  and  phenyl-ring  hydroxylation.  Thus,  oxyphenbutazone,  the  p-hydroxy- 
lated  metabolite  (Z  =  OH  in  5)  with  activity  similar  to  that  of  the  parent  drug,  is  also 
offered  on  the  market.  Interestingly,  compound  4  evolved  from  an  organized 
laboratory  search  for  anti-inflammatory  drugs  and  has  been  in  use  since  1963.  The 
anti-inflammatory  effects  are  evident  in  the  treatment  of  rheumatoid  and  other  types 
of  arthritis,  including  acute  gout.  Indomethacin  exhibits  central  and  peripheral  anal¬ 
gesic  properties  which  are  distinct  from  its  anti-inflammatory  effects;  its  antipyretic 
effect  is  demonstrable  in  patients  with  fever.  More  about  1-4  and  related  agents  can 
be  found  in  Ref.  10. 


Experimental 

Compounds.  Compounds  1-4  were  of  high  purity  and  were  supplied  by  the  Bureau 
of  Drug  Control,  Zagreb. 

Spectra.  The  Hel  PE  spectra  were  recorded  (FWHM  ~  20  meV)  on  a  Vacuum 
Generators  UV-G3  spectrometer  [8],  Temperatures  of  120,  100,  170,  and  190°C  at 
the  inlet  system  were  used  in  studies  of  1,  2,  3,  and  4,  respectively.  The  energy  scale 
was  calibrated  using  mixtures  of  Ar  and  Xe. 
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Results 

The  low-resolution  PE  spectra  of  the  pyrazolones  1.-3  are  shown  in  Figure  1  and 
that  of  compound^  is  given  in  Figure  2.  The  expanded  low  energy  part  of  die  spectra 
of  compounds  1.  and  2  is  shown  in  Figure  3.  Numbers  at  the  top  of  PE  bands  corre¬ 
spond  to  vertical  ionization  energies,  E,  v/eV.  A  correlation  of  the  electronic  energies 
of  aniline  [16],  acetanilide  [1],  3,  1 , 2-diphenylpyrazoline  [13],  hydrazobenzene  [13], 
and  I  and  2  are  given  in  Figure  3;  those  of  indole- 3-acetic  acid  [6],  indole  [6], 
2-methylindole  [6],  2-methyl-5-methoxyindoIe  [6],  and  4  are  given  in  Figure  5. 
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Figure  1.  Hel  photoelectron  spectra  of  (top  to  bottom):  antipyrine  (i),  aminopyrine  (2), 

and  phenylbutazone  (2). 
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Discussion 

Besides  the  previous  work  on  pyrazolones  [5]  and  indoles  (6],  the  substituted 
hydrazine  studies  by  Kimura  and  co-workers  [12],  Rademacher  and  co-workers  [13], 
and  Nelsen  and  Buschek  [14]  are  of  great  relevance  for  the  present  analysis.  These 
authors  found  a  strong  dependence  of  the  two  lone-pair  splittings  on  dihedral  angle 
(i.e.,  on  the  chemical  conformation).  These  findings,  later  confirmed  by 
nuclear  magnetic  resonance  (nmr)  [14]  and  quantum  chemical  calculations  [12, 13], 
led  to  deduction  of  the  gas  phase  conformations.  A  comprehensive  overview  of  n-n 
conjugation  in  hydrazines  and  hydrazobenzenes  has  been  given  by  Brown  and 
Jorgensen  [15]. 

A  conjunction  of  our  own  results  [1]  for  aniline,  acetanilide,  hydrazobenzene,  and 
1 ,2-diphenylpyrazoline  with  those  of  Rademacher,  Bass  and  Wildemann  [15]  leads 
immediately  to  an  electronic  structure  for  3  (Fig.  4).  The  relevant  orbitals  are:  (i)  The 
antibonding  (i r,)  and  bonding  (7t3)  molecular  orbitals  of  aniline  that  arise  from  an  inter¬ 
action  of  the  amine  substituent  with  the  b ,  component  of  the  degenerate  elg  it  orbital 
of  bezene;  (ii)  The  ir2  orbital,  largely  the  a2  component  of  elg  which  is  unaffected  by 
substitution  at  a  nodal  ring  center;  and  (iii)  The  amidic  carbonyl  lone-pair  orbitals,  n0. 
The  tt2  and  n0  orbitals  remain  essentially  unaltered  from  one  compound  to  another. 
However,  the  addition  of  two  anilines  to  form  hydrazobenzene  splits  each  of  the  7T, 
and  tt3  orbitals  into  two  components,  in  accord  with  the  degree  of  n*N  and  n*  contri¬ 
butions  contained  in  the  resultant  orbitals.  Furthermore,  this  splitting  is  very  sensitive 
to  conformation.  Finally,  given  that  the  environment  of  the  5-membered  ring  is  much 
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Figure  4.  Correlation  diagram  for  the  observed  vertical  ionization  energies.  £,  v/eV,  of 
aniline  [16],  acetanilide  [t|,  phenylbutazone  Q),  1 ,2-diphenylpyrazoline  [13],  and  hydra¬ 
zobenzene  [13]. 
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the  same  in  1 ,2-diphenylpyrazoline  and  in  3,  we  can  assume  the  electronic  structure 
to  be  very  similar  in  both  cases.  A  synthesis  of  these  attitudes,  together  with  the  fact 
that  the  introduction  of  a  -CO.R  group  to  an  aromatic  amine  generally  causes  an 
increase  of  ionization  energies  and  produces  a  new  n0  ionization  at  about  9.5  eV, 
leads  to  the  following  assignments  for  3:  £,/eV:  7.95(7r,+ )  <  — 8.8(irf )  <  ~9.3(27T2) 
<  9.5(2n0)  <  10.33(^3 )  <  10.9(tt3+). 

Assignments  for  X  and  2  do  not  come  readily  for  a  number  of  reasons:  First,  very 
little  PE  data  exist  for  mixed  hydrazines  that  contain  one  aromatic  and  one  aliphatic 
substituent.  This,  of  course,  complicates  all  extrapolary  arguments.  Second,  several 
conformations  of  the  same  molecule  can  exist  in  an  equilibrium  mixture.  For  exam¬ 
ple,  our  own  studies  [3]  of  the  tautomerism  of  the  CH-,  OH-,  and  NH-  forms  of  pyra¬ 
zolones  in  the  gas  phase  suggest  a  preference  for  the  CH  form  when  all  other  forms 
can  coexist  and  that,  in  general,  the  NH  form  is  the  least  stable.  Compounds  i  and  2 
are  forced  into  an  NH  form  as  a  result  of  appropriate  substitution.  From  the  expanded 
low  energy  parts  of  the  PE  spectra  of  i  and  2,  tentatively,  we  may  assign: 

for  1:  £,/eV:  7.78  (rr,+);  8.5  «);  9.1  (tt2);  9.3  (no);  10.12  and  11.09  (rr3  and  ttcc) 
for  2:  £,/eV:  7.5  (nNMe2);  7.87  (rrf);  8.63  «);  9.08  (ir2);  9.47  (no);  9.99  and  10.89 
(tt3  and  7rcc) 

It  would  be  remiss  not  to  point  out  the  deficiencies  in  this  assignment  for  com¬ 
pound  2.  For  example,  the  n^  ionization  is  expected  at  8.5  eV.  Its  appearance  at 
7.5  eV  seems  to  be  a  result  of  interaction  of  the  dimethylamino  group  with  the  neigh¬ 
boring  carbonyl.  The  corresponding  n0  ionization  becomes  broad  in  shape  and  is 
shifted  to  higher  energy.  On  the  other  hand,  an  assignment  of  the  7.5  and  7.87  eV 
events  to  i r7+  and  7r7”,  respectively,  would  because  of  the  negligible  splitting,  indicate 
a  completely  different  conformation  of  2  versus  i  and  3.  Obviously,  additional  work 
is  mandated. 

The  assignment  of  the  PE  spectrum  of  4  follows  from  that  of  indole,  2-methylin- 
dole,  2-methyl-5-methoxy-indoIe,  indole- 3-acetic  acid,  and  the  general  effects  of 
N -acylation  on  PE  spectra  (Fig.  5).  The  shape  of  relevant  indole  orbitals  is  shown  in 
Figure  5:  in  indole-3-acetic  acid  and  2-methyl-indole,  the  ionization  energies 
decrease  relative  to  the  parent  indole;  and  an  additional  ionization  event,  one  associa¬ 
te  with  the  lone  pair  of  the  carbonyl  group  (i.e. ,  /(«oac))>  should  appear  at  —10.3  eV 
in  the  acid.  Furthermore,  methoxy  substitution  of  the  A -methyl  derivative  in  the 
5-position  should  destabilize  7r,  slightly,  n2  and  7r3  not  at  all  (because  of  a  node  at  the 
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Figure  5.  Correlation  diagram  for  the  observed  vertical  ionization  energies.  £,  ,/eV.  of 
indole-3-acetic  acid  (6),  indole  (6),  2-methylindole  [6],  2-methyl-5-methoxyindole  (6],  and 

indomethacin  (£). 


point  of  substitution  in  ir}).  Finally,  the  fact  that  benzoylation  of  the  nitrogen,  with  a 
p-chlorobenzoyl  entity,  is  known  [1]  to  increase  the  ionization  energies,  leads  to  a 
facile  assignment:  the  three  ionization  events  that  lie  below  10  eV  (two  associable 
with  the  benzene  ring,  namely  7rBl  (6,)  and  irBt  (a2)  at  9.04  and  —9.6  eV,  respec¬ 
tively;  and  one  associable  with  the  CO  group  of  benzoyl,  namely  /(n0),  are  immedi¬ 
ately  assignable.  The  other  /(«oAc)  event  is  observed  as  a  shoulder  at  —10.3  eV,  and, 
of  course,  the  /(tic)  events  occur  at  11.4  eV. 

Summary 

The  application  of  a  composite  molecule  method  to  an  analysis  of  the  PE  spectra  of 
some  rather  complex  drugs  has  been  shown  to  yield  some  definitive  insights  into  the 
electronic  structure  of  those  drugs,  despite  the  fact  some  important  intermediaries 
(i.e.,  component  molecules)  had  neither  been  studied  nor  analyzed.  In  part,  this  suc¬ 
cess  is  attributable  to  the  fact  that  all  four  drugs  possess  a  common  molecular  frag¬ 
ment,  5,  and  that  this  fragment  is  mainly  responsible  for  the  comparatively  low 
observed  ionization  potentials  (<8  eV). 

In  the  absence  of  detailed  knowledge  of  the  specific  steps  in  the  drug  mechanism 
of  these  compounds,  it  is  therefore  reasonable  to  suppose  that  all  these  drugs  function 
as  electron  donors. 


Acknowledgment 

This  work  was  supported  by  the  Research  Council  of  SR  Croatia  (SIZ-za  znanost), 
the  U.S.  National  Institutes  of  Health  (NIH),  and  the  U.S.  Department  of  Energy. 


324 


KLASINC  ET  AL. 


Bibliography 

(I]  L.  Klasinc,  I.  Novak.  •>  Ijic.  and  S.P.  McGlynn,  1m.  J.  Quantum  Chem.  QBS13.  251  (1986). 

12]  S.P.  McGlynn,  L.  Vanqw-tcKenbome,  D.  Carrol,  and  M.  Kinoshita,  Introduction  tv  Applied  Quantum 
Chemistry  (Holt,  Rinehart  and  Winston,  New  York.  1971). 

[3]  T.  Meeks,  A.  Wahtborg,  and  S.P.  McGlynn,  J.  Electron  Spectrosc.  22,  43  (1981). 

[4]  D.  M.  Woodbury  and  E.  Finge,  in  The  Pharmacological  Basis  of  Therapeutics.  L.  S.  Goodman  and 
A.  Gilman,  Eds.  (Macmillan,  New  York,  1975),  35th  Edition  pp.  325-358. 

[5]  G.  Kluge.  G.  Kania,  F.  Achenbach,  H.  Wilde,  I.  Novak,  and  L.  Klasinc,  Int.  J.  Quantum  Chem. 
QBS11,  237  (1984). 

[6]  H.  Glisten,  L.  Klasinc,  J.  V.  Knop,  and  N.  Trinajstic,  in  Excited  States  of  Biological  Molecules. 
J.  Birks,  Ed.  (Wiley,  Chichester,  1975),  p.  45;  H.  Glisten,  L.  Klasinc,  and  B.  RuSfic,  Z.  Natur- 
forsch.,  31a,  1051  (1976). 

[7]  S.  H.  Snyder  and  C.  R.  Merrie,  Proc.  Natl.  Acad.  Sci.  U.S.A.  54,  258  (1965);  see  also,  Molecular 
Orbital  Studies  in  Chemical  Pharmacology,  L.  B.  Kier,  Ed.  (Springer,  W.  Berlin,  1970). 

]8)  L.  N.  Domelsmith,  L.  L.  Munchausen,  and  K.N.  Houk,  J.  Am.  Chem.  Soc.  99,  4311.  6506 
(1977);  L.  N.  Domelsmith  and  K.  N.  Houk,  Int.  J.  Quantum  Chem.  QBSS,  257  (1978);  L.  N.  Domel¬ 
smith,  L.  L.  Munchausen,  and  K.N.  Houk.  J.  Med.  Chem.  20.  1346  (1977);  L.N.  Domelsmith, 
T.  A.  Eaton,  K.N.  Houk,  G.M.  Anderson  111,  R.  A.  Glenonn,  AT.  Shulgin,  N.  Castagnoli.  and 
P.A.  Kollman,  J.  Med.  Chem.  24.  1414  (1981). 

[9]  V.  Butkovid,  B.  Kovad,  I.  Novak,  B.  RuSdic,  A.  Sabljid,  L.  Klasinc,  and  S.P  McGlynn,  in  Mod¬ 
elling  of  Structure  and  Properties  of  Molecules.  Z.  B.  Maksid,  Ed..  (Ellis  Horwood  Ltd..  Chichester), 
in  press,  L.  Klasinc.  B.  Kovad,  E.  Polla.  and  S.  Mutak.  Acta  Pharm  Jugosl.  37,  67  (1987)  (part  13 
of  the  series). 

[10]  Symposium  on  "Anti-Rheumatic  Drugs."  E.C.  Huskisson.  Ed.  (Praeger  Publishers,  New  York. 
1983). 

[11]  L.  Klasinc,  B.  Kovad,  and  B.  RuSdid,  Kern.  Ind.  (Zagreb)  23,  569  (1974). 

[12]  K.  Kimura  and  K.  Osafune,  Bull.  Chem.  Soc.  Japan,  48,  2421  (1975);  K.  Kimura,  S.  Katsumata, 
and  K.  Osafune,  Bull.  Chem.  Soc.  Japan,  48,  2736  (1975). 

[13]  P  Rademacher,  V.  M.  Bass,  and  M.  Wildemann,  Chem.  Ber.  110.  1939  (1977). 

[14]  S.F.  Nelson  and  J.M  Buschek,  J.  Am.  Chem.  Soc.  95.  2011.  2013  (1973). 

[15]  R.S.  Brown  and  F.  S.  Jorgensen,  in  Electron  Spectroscopy:  Theory,  Techniques  and  Applications, 
Vol.  5,  C.  R.  Brundle  and  A.  D.  Baker,  Eds.  (Academic  Press,  London.  1984)  pp.  2-120. 

[16]  L.  Klasinc.  B.  Kovad,  and  H.  Gusten,  Pure  Appl.  Chem.  55,  289  (1983). 


Received  June  15,  1987 


On  the  Use  of  the  Weighted  Identification  Numbers 
in  the  QSAR  Study  of  the  Toxicity  of  Aliphatic 

Ethers 

B.  BOGDANOV,*  S.  NIKOLIC,  A.  SABLJIC,  AND  N.  TRINAJSTIC 

The  Rugjer  BoSkovic  Institute,  P.O.B.  1016,  41001  Zagreb,  Croatia,  Yugoslavia 


S.  CARTER 

Department  of  Chemistry  ,  The  University  of  Beading,  Reading  RG6  2AD,  England,  United  Kingdom 


Abstract 

We  have  examined  an  application  of  the  weighted  identification  number  in  the  QSAR  study  of  the  tox¬ 
icity  of  aliphatic  ethers  on  mice.  The  results  obtained  are  superior  to  those  achieved  by  the  connectivity- 
index. 


Introduction 

Recently  a  novel  graph-theoretical  index,  known  as  the  weighted  identification 
(WID)  number,  has  been  introduced  [1],  which  appears  to  be  a  highly  selective  struc¬ 
tural  descriptor.  It  is  also  worth  noting  that  the  WID  number  can  be  computed 
straightforwardly  for  any  structure  [1,2]. 

We  decided  to  test  the  applicability  of  the  WID  number  in  quantitative  structure- 
activity  relationships  (QSAR)  studies.  The  toxicity  of  aliphatic  ethers  was  selected 
for  this  purpose  because  the  problem  has  already  been  treated  with  the  connectivity 
index  of  Randic  [3]  with  some  success  [4].  The  connectivity  index  is  so  far  the  most 
successful  graph-theoretical  descriptor  used  in  QSAR  work  [5, 6]  and  thus  we  will  be 
able  to  investigate  how  the  WID  number  performs  in  comparison  with  the  connec¬ 
tivity  index  on  the  same  sample. 

People  have  been  interested  in  the  anesthetic  activity  of  aliphatic  ethers  since  the 
discovery  of  diethyl  ether  in  1542  [7],  Interest  has  been  particularly  focused  on  the 
toxicities  of  ethers  [8],  We  will  consider  the  correlation  between  the  WID  number 
and  the  toxicities  (pC)  of  a  set  of  21  aliphatic  ethers  on  mice  |4.8]  in  an  attempt  to 
produce  a  QSAR  model  of  predictive  power.  We  will  also  carry  out  calculations  with 
the  connectivity  index  using  exactly  the  same  types  of  regression  analyses  as  those 
employed  for  the  WID  number. 


*  Permanent  address:  Department  of  Chemistry,  University  of  Skopje.  S'  ppje,  Macedonia.  Yugoslavia 


INTERNATIONAL  JOURNAL  OF  QUANTUM  CHEMISTRY:  QUANTUM  BIOLOGY  SYMPOSIUM  14.  325-330  ( 1987) 

©1987  by  John  Wiley  &  Sons,  Inc.  CCC  0360-8832/87/010325-06  $04  00 


326 


BOGDANOV  ET  AL. 


The  WID  Number 


In  presenting  a  brief  derivation  of  die  WID  number  we  will  use  graph-theoretical 
language  for  convenience  [9, 10].  The  incentive  for  the  development  of  the  WID 
number  was  Randid’s  molecular  identification  (ID)  number  [11]  and  its  successful  ap¬ 
plication  in  QSAR  studies  [12, 13],  However,  isomeric  trees  were  found  with  the 
same  ID  number  [1],  This  fact  stimulated  us  to  look  for  a  number  with  much  greater 
selectivity  than  Randid’s  ID  number,  and  if  such  a  number  could  be  found,  to  investi¬ 
gate  whether  it  could  be  used  in  QSAR  studies.  Let  G  =  (V,E)  be  a  graph  with  the 

vertex-set  V  =  V(G)  and  the  edge-set  E  =  E(G).  Let  V  =  (v„v2 . vN)  be  a 

labelling  of  V.  The  distance  between  the  vertices  v,  and  vJ  is  denoted  by  d(ij).  Note 
that  d(i,j)  =  d(j,i),  and  d(i,i )  =  0.  Distances  d(i,j)  are  elements  of  the  distance 
matrix  of  G,  D  =  D(G)  [14—16].  The  distance-sum  D,  in  D  is  defined  by  [17-19]: 

N 

D,  =  'ZdOjY,  IrStsN  (1) 

i- 1 


The  distance  sum  has  been  used  by  Seybold  [20]  as  a  measure  of  the  compactness  or 
centrality  of  a  particular  site  in  a  molecule.  The  distance  sums  may  be  easily  obtained 
with  any  of  several  available  computer-oriented  algorithms  for  constructing  the 
distance  matrix  for  any  structure  [21,22]  and  they  simply  represent  the  sums  of  the 
elements  in  the  rows  (or  columns)  of  the  distance  matrix.  The  weights  of  edges  w0  in 
G  are  defined  as  [3,  18]: 


w„ 


( DiDj )  1,2  if  d(i,j)  =  l;l<i<JV,l<j<JV 
0  otherwise 


(2) 


They  represent  the  elements  of  the  matrix  of  weights,  W  =  W(G).  Let  w  = 
(v,y  v,2,  •  •  • ,  %)  be  a  walk  [9]  of  length  k.  The  weight  of  this  walk  is  defined  by: 
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The  weight  of  all  walks  of  length  k  between  vertices  vt|  and  vj<+)  is  given  by: 

2  n^+, 

>  /=! 


(3) 

(4) 


The  entry  wf  of  W*  is  the  sum  of  all  weighted  walks  of  length  k.  The  WID  number 
of  G  is  then  defined  as  follows: 


where: 


Note  that: 


WID(G)  =  N  -  (\/N)  +  (1  /Nf  ■  1D*(G) 
ID*(G)  =  £  £< 

i=i  ,=i 


(5) 

(6) 
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and: 

N- 1 

W*  =  2  W*.  (8) 

*= o 

Note  that  because  of  the  way  in  which  the  WID  number  is  constructed,  its  limits  are: 

N  <  WID(G)  <  N  +  1  (9) 

for  each  graph  G  with  N  vertices. 

We  have  devised  a  computer  program  for  calculating  the  WID  numbers  which 
starts  with  the  distance  matrix  and  proceeds  via  D„  W,  W\  ID*  and  finally  ends  with 
the  WID  number  [23, 24]. 


Results  and  Discussion 

The  toxicity  of  21  aliphatic  ethers  together  with  their  WID  numbers  and  connec¬ 
tivity  indices  are  given  in  Table  I.  The  WID  numbers  are  calculated  as  shown  in  the 
previous  section.  The  connectivity  index  \  is  calculated  by  the  following  formula  [3]: 

X  =  £  («,  •  (10) 

bonds 

where  m,  and  n}  are  the  valencies  of  the  endpoints  of  the  bond  i-j. 


Table  1.  Toxicides  of  aliphatic  ethers  (R,  -  0  -  R2)  on  mice  pC  and  the  correspond¬ 
ing  WID  numbers  and  connectivity  indices. 


Ether 

pC" 

WID 

X 

Dimethyl 

1.43 

3.29255 

1.414 

Methyl  ethyl 

1.74 

4.12444 

1.914 

Methyl  propyl 

2.45 

5.05815 

2.414 

Methyl  isopropyl 

2.26 

5.07386 

2.270 

Methyl  butyl 

2  70 

6.03157 

2.914 

Methyl  isobutyl 

2.79 

6.03985 

2.770 

Methyl  secbuty) 

2.79 

6.03723 

2.808 

Methyl  terbutyl 

2.79 

6.05011 

2.561 

Methyl  pentyl 

2.88 

7.01917 

3.414 

Diethyl 

2.22 

5.05815 

2.414 

Ethyl  propyl 

2.60 

6.03157 

2.914 

Ethyl  isopropyl 

2.60 

6.03723 

2.700 

Ethyl  butyl 

2.82 

7.01917 

3.414 

Ethyl  isobutyl 

2.82 

7.02315 

3.270 

Ethyl  secbuty  1 

2.85 

7.02154 

3.308 

Ethyl  terbutyl 

2.92 

7.02712 

3.061 

Ethyl  pentyl 

3.00 

8.01257 

3.914 

Ethyl  terpentyl 

3.15 

8.01811 

3.621 

Dipropyl 

2.79 

7.01917 

3.414 

Propyl  isopropyl 

2.82 

7.02154 

3.270 

Di-isopropy! 

2.82 

7.02449 

3.126 

‘  Refs.  4  and  8. 
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We  examined  two  types  of  correlations:  (a)  linear  least-squares  fit 

pC  =  a  +  •  I  (11) 


and  (b)  quadratic  least-squares  fit 

pC  =  a+  b-  I  +  c-I2  (12) 

where  I  =  WID  or  x  The  results  of  the  above  regression  analyses  are  given  in 
Tables  II  and  HI. 

From  the  statistical  data  in  Tables  II  and  III,  we  see  that  the  WID  number  is  supe¬ 
rior  to  the  connectivity  index.  It  appears,  both  from  earlier  comparative  studies  [25], 
and  from  this  work,  that  of  all  single  graph-theoretical  indices  used  for  the  correlation 
with  the  toxicities  of  aliphatic  ethers,  the  most  promising  QSAR  model  is  achieved 
with  the  WID  number.  Even  when  a  polyparametric  regression  equation  with  several 
kinds  of  graph-theoretical  indices  is  employed,  the  quality  of  the  QSAR  model  with 
only  the  WID  number  is  unsurpassed. 

One  possible  reason  for  this  good  performance  of  the  WID  number  is  discussed 
here.  If  we  carry  out  the  regression  analyses  by  using  only  the  number  of  atoms  in 
the  ether,  the  following  statistical  equations  are  obtained  for  the  two  cases  considered 
above: 


pC  =  1.002  +  0.314  •  N;  r  =  0.947  j  =  0.133 

F'  n  =  166  r2  (adjusted)  =  0.892  (13) 


Table  II.  Statistical  characteristics  of  a  linear  relationship  between  the  aliphatic  ether 
toxicities  on  mice  and  the  WID  numbers  and  connectivity  indices. 


Statistical  data 

I 

a 

b 

r 

s 

f  '  I’ 

readjusted) 

WID 

0.602 

0.325 

0.942 

0.139 

149 

0.881 

X 

0.792 

0.633 

0.909 

0.172 

91 

0.818 

Table  III.  Statistical  characteristics  of  a  quadratic  relationship  between  the  aliphatic 
ether  toxicities  and  the  WID  numbers  and  connectivity  indices. 


Statistical  data 


I  a  b  c  r  s  F2'*  readjusted) 


WID  -1.321  1.020  -0.060  0.976  0.090  181  0.947 

v  -1.019  2.046  -  0.261  0.955  0.123  93  0.902 
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pC  =  0.027  +  0.767  •  N  -  0.048  •  N2;  r  =  0.975  s  =  0.092 

F2' 18  =  174  r2  (adjusted)  =  0.945  (14) 

These  results  are  rather  nice  and  one  may  be  tempted  to  recommend  the  number  of 
atoms  to  be  used  in  many  QSAR  models.  However,  the  number  of  atoms  is  a  descrip¬ 
tor  of  low  discriminatory  power,  it  cannot,  of  course,  differentiate  isomeric 
molecules. 

From  the  structure  of  the  formula  for  the  WID  number  [see  Eq.(5)],  we  see  that 
this  number  is  rather  closely  related  to  the  number  of  atoms  in  a  molecule.  Hence, 
the  WID  number  could  be  simply  presented  as: 

WID  =  N  +  corr.  (15) 

where: 

corr.  =  -(1/AO  +  (1/AO2  •  ID*  (G)  (16) 

The  correction  (16)  for  a  great  number  of  chemical  graphs  is  rather  small.  It  will 
increase  with  the  increasing  complexity  of  a  graph.  The  superiority  of  the  WID 
over  N  is  clear  in  the  case  of  isomers.  For  example,  all  366,319  isomers  of  C^H^ 
alkane  are  differentiated  by  the  WID  number  while  they  all  have  the  same  N  =  20. 
This  sensitivity  of  the  WID  number  is  related  to  the  small  correction  given  in  expres¬ 
sion  (16). 

In  the  past,  very  discriminative  graph-theoretical  indices  have  been  found  not  to  be 
particularly  useful  in  QSAR  studies.  The  large  amount  of  structural  information  con¬ 
tained  in  such  graph-theoretical  indices  may  obscure  those  factors  that  are  significant 
for  a  particular  property  that  is  to  be  modelled  via  QSAR  technology.  A  good  exam¬ 
ple  to  illustrate  this  point  is  Balaban’s  index  [18],  which  is  a  highly  discriminative 
descriptor  which  has  so  far  shown  little  use  in  QSAR  studies  [26], 

Concluding  Remarks 

We  wish  to  point  out  that  the  WID  has  many  good  features  for  applications 
to  QSAR  studies.  It  is  the  most  discriminative  graph-theoretical  index  that  has 
been  found  to  date.  The  WID  is  designed  to  avoid  large  structural  information  which 
may  obliterate  its  use  in  constructing  QSAR  models.  Therefore,  the  WID  number 
shows  potential  for  use  in  QSAR  work.  However,  more  work  is  needed  before  the 
range  of  its  applicability  is  established.  Some  research  in  this  direction  is  already  in 
progress  [27], 
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Abstract 

Proton  nuclear  magnetic  resonance  (nmr)  signals  may  be  obtained  from  the  human  body  in  such  a  way 
as  to  produce  images  of  anatomical  slices.  The  NMR  signal  is  mapped  in  the  selected  slice  by  application  of 
field  gradients  which  provide  spatial  encoding.  Excellent  nmr  images  are  now  obtained  which  are  useful  in 
clinical  practice.  Introduction  of  nmr  imaging  into  major  hospitals  is  proceeding  rapidly,  with  10  commer¬ 
cial  companies  supplying  the  market;  over  500  whole  body  systems  have  been  installed  worldwide.  As  a 
method  of  medical  imaging  nmr  has  die  advantage  that  it  uses  no  ionizing  radiation  and  is  therefore  inher¬ 
ently  safe;  moreover,  it  gives  sections  in  transverse,  coronal,  and  sagittal  planes  with  equal  ease,  and  has 
good  tissue  contrast  and  pathological  contrast  arising  from  relaxation  time  differences.  Contrast  may  be 
improved  by  use  of  contrast  agents.  Current  trends  suggest  that  before  very  long,  whole-body  nmr  systems 
will  be  found  in  all  major  hospitals. 

It  is  of  great  importance  to  the  physician  to  be  able  to  view  the  inside  of  the  human 
body  in  order  to  effectively  deal  with  diseases  by  medical  or  surgical  treatment.  One 
of  the  best  ways  to  accomplish  this  is  by  use  of  x-rays,  and  now,  with  the  develop¬ 
ment  of  computed  tomography  (CT)  x-ray  scanning,  extremely  good  images  can  be 
obtained.  However,  x-rays  do  have  a  serious  disadvantage;  they  are  an  ionizing  radia¬ 
tion  and  therefore  they  can  do  us  some  harm  in  the  course  of  doing  us  some  good. 

On  the  other  hand,  obtaining  human  images  by  magnetic  resonance  (mri)  has  the 
great  advantage  of  not  using  ionizing  radiation  and  is  therefore  inherently  safer. 
Moreover,  in  addition  to  being  nonhazardous,  magnetic  resonance  images  offer 
improved  diagnostic  information  through  tissue  discrimination  and  pathological  dis¬ 
crimination  arising  from  differences  in  relaxation  time.  Furthermore,  one  can  obtain 
images  in  any  desired  plane:  transverse,  sagittal,  or  coronal  with  equal  ease.  With  CT 
x-ray  scanners,  one  can  obtain  direct  images  only  in  the  transverse  plane. 

When  obtaining  human  images  by  nuclear  magnetic  resonance  (NMR),  we  place  the 
body  in  a  magnetic  field  and  obtain  an  image  of  the  distribution  of  the  hydrogen 
nuclei  in  a  slice  of  the  body.  Therefore,  it  is  necessary  to  first  define  the  slice  to  be 
imaged.  This  is  done  by  the  selective  excitation  method  [1,2].  The  body  is  placed  in 
a  magnetic  field  with  a  field  gradient  along  the  Z  direction.  The  body  is  then  irradi¬ 
ated  with  a  90°  resonant  NMR  pulse  of  narrow  spectral  width  so  that  nuclei  in  a  nar¬ 
row  range  of  Z  values  only  are  excited.  In  this  way  a  slice  in  the  body  is  defined  and 
only  the  protons  in  this  slice  are  excited. 

The  field  gradient  is  then  immediately  switched  into  the  plane  of  the  slice,  along 
say  the  X  direction,  and  the  NMR-free  induction  decay  evolves  during  a  time  tx.  The 
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NMR  signal  from  each  volume  element  depends  on  the  proton  density  there  and  its 
frequency  depends  on  its  x  coordinate.  After  time  t,  the  gradient  is  suddenly  switched 
along  the  Y  direction  and  the  NMR  signal  continues  to  evolve  during  the  subsequent 
time  ty .  The  NMR  signal  from  each  volume  element  depends  now  not  only  on  its  x 
coordinate  (which  determined  its  phase  at  time  tx)  but  also  on  its  y  coordinate  which 
determines  its  frequency  in  ty.  Every  volume  element  in  the  defined  slice  is  thus 
labelled  or  encoded  according  to  its  x  and  y  coordinates.  The  NMR-free  induction 
decay  is  recorded  during  ry  to  n  data  points. 

The  procedure  is  now  repeated  for  n  different  values  of  tx ,  and  an  array  of  n  x  n 
data  values  is  recorded.  To  this  array  of  data,  a  two-dimensional  Fourier  transform  is 
applied  to  obtain  an  image  with  n  x  n  picture  elements  (pixels).  This  is  the  2D 
Fourier  imaging  method  [3]  which  is  most  commonly  used  in  clinical  magnetic  reso¬ 
nance  imaging  at  present.  Further  details  of  the  procedures  are  given  in  Ref.  4. 
Instead  of  applying  a  fixed  gradient  during  an  incremented  time  tx ,  it  is  often  more 
convenient  to  apply  an  incremented  gradient  during  a  fixed  time  tx:  this  variation  is 
called  the  spin-warp  method  (5).  Nowadays,  n  is  typically  256  and  allows  a  resolu¬ 
tion  rather  less  than  a  millimeter  in  the  head  and  rather  more  than  this  in  the  body. 
Needless  to  say,  the  instrument  must  have  its  own  dedicated  computer  which  instructs 
the  system,  collects  and  stores  the  data,  and  performs  the  calculations. 

Figure  1  shows  a  proton  magnetic  resonance  image  of  a  thin  transverse  slice 
through  the  author’s  head  obtained  on  the  0. 15  Tesla  NMR  scanner  which  has  been  in 
operation  in  the  University  of  Florida  hospital  for  over  three  years.  This  image  shows 
the  eyes,  nose,  scalp,  the  marrow  in  the  skull,  and  the  two  hemispheres  of  the  brain. 
This  image  is  one  of  a  set  of  10  parallel  slices  obtained  simultaneously  by  the  multi¬ 
slice  technique  [4]  in  about  5  minutes. 

As  mentioned  earlier,  NMR  images  may  be  obtained  readily  in  any  orientation,  and 
Figure  2  shows  a  sagittal  image  of  the  author’s  head,  also  from  a  set  of  10  coplanar 
images  obtained  simultaneously.  The  image  in  Figure  2  is  a  midline  section  showing 
the  scalp,  the  marrow  of  the  skull,  the  corrugations  of  the  cortex,  the  spinal  cord,  and 
the  cerebellum. 

Figure  3  shows  one  of  a  set  of  10  sagittal  images  through  the  author’s  chest,  show¬ 
ing  the  vertebrae  and  the  spinal  cord.  Images  of  the  heart  are  blurred  by  its  motion, 
but  this  motional  artifact  can  be  removed  by  synchronizing  the  successive  90°  pulses 
with  impulses  from  an  electrocardiograph  probe.  Figure  4  shows  one  of  a  set  of  10 
transverse  images  through  the  author’s  abdomen  showing  the  liver,  spine  in  section, 
aorta,  intersections  with  the  rib  cage,  and  the  two  arms  on  either  side  of  the  body. 

These  four  figures  are  examples  of  the  quality  of  magnetic  resonance  images  from 
what  is  hopefully  a  normal  human  body  using  a  low-field,  first-generation  instru¬ 
ment.  In  July  1987  we  expect  to  take  delivery  of  a  1.5  Tesla  superconducting  nmr 
imaging  system  in  the  University  of  Florida  hospital  which  should  provide  images 
with  improved  resolution  and  diagnostic  capability. 

It  will  be  noticed  that  the  images  shown  have  all  been  obtained  from  the  nmr 
response  of  protons  in  the  body.  In  fact,  almost  all  clinical  imaging  is  done  with  the 
hydrogen  nuclei.  Hydrogen  is  the  most  abundant  chemical  element  in  the  body, 
perience  of  an  mri  examination,  and  magnetic  resonance  will  then  become  part  of  the 
everyday  language  of  the  man-in-the-street. 
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Figure  1.  Proton  nmr  image  of  a  thin  transverse  slice  through  the  author's  head. 


Figure  2.  Proton  nmr  image  of  a  midline  sagittal  section  through  the  author's  head. 
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Figure  3.  Proton  nmr  image  of  a  sagittal  section  through  the  author's  chest. 
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Moreover,  the  proton  is  isotopically  almost  100#  abundant  and  has  the  best  magnetic 
properties  of  all  stable  nuclei.  Oxygen  has  no  suitable  isotope.  For  carbon  only  the 
isotope  13C  is  possible  and  it  is  only  1%  abundant.  Phosphorus  is  of  considerable 
interest  because  of  its  role  in  metabolism  and  31 P  is  a  good  NMR  nucleus;  however,  its 
concentration  in  the  body  is  low,  leading  to  a  spatial  resolution  an  order  of  magnitude 
worse  than  hydrogen.  Consequently,  protons  are  by  far  the  favorite  nucleus  for  mag¬ 
netic  resonance  imaging.  Some  imaging  work  has  been  done  with  Li,  l3C,  ”F,  J3Na, 
3IP,  and  also  with  unpaired  electrons,  but  this  is  all  at  the  research  level  and  not  in 
regular  clinical  practice. 

The  use  of  magnetic  resonance  imaging  In  hospitals  depends  on  the  extent  to  which 
the  physicians  find  it  useful  in  their  clinical  work.  One  example  in  our  University 
hospital  is  provided  by  the  pioneering  work  of  Dr.  W.  F.  Enneking  on  musculoskele¬ 
tal  tumors.  In  the  past,  treatment  usually  consisted  of  amputation  of  the  limb,  but  in 
recent  years  the  excision  of  the  tumor  is  a  much  more  common  practice.  However, 
this  demands  very  precise  delineation  of  the  tumor.  The  longer  NMR  relaxation  times 
exhibited  by  tumors  [6]  enable  the  tumors  to  be  demonstrated  with  superior  contrast 
to  x-ray  images.  Some  350  cases  of  tumors  of  the  musculoskeletal  system  have  now 
been  examined  with  our  magnetic  resonance  imaging  (MRl)  system  over  the  past  three 
years  and  nmr  images  have  played  a  major  role  in  limb  salvage  surgery  in  these 
cases  [7]. 

Sometimes  lesions  of  interest  do  not  show  a  significant  difference  of  relaxation 
times  T,  and  T2  from  normal  tissue  and  are  not  well  discriminated.  In  such  cases,  it 
can  be  useful  to  administer  a  contrast  agent,  which  enhances  the  relaxation  rate  dif¬ 
ferentially.  A  popular  agent  is  a  solution  of  a  gadolinium  chelate  DTPA.  The  gadolin¬ 
ium  atom,  a  rare  earth,  has  a  strong  electronic  magnetic  moment  from  its  inner 
unpaired  4f  electrons,  while  the  chelate  group  grasps  the  outer  electrons  making  the 
molecules  unreactive  and  nontoxic.  Moreover,  the  large  chelate  molecule  is  less  able 
to  penetrate  the  blood-brain  barrier,  but  can  penetrate  tumors  and  other  lesions,  relax¬ 
ing  them  strongly  and  giving  them  good  constrast  in  a  relaxation-weighted  image. 

In  the  United  States,  pregnant  women  are  not  normally  examined  by  MRI,  but  in 
Britain  national  guidelines  now  allow  this  after  the  first  trimester,  which  often  proves 
useful  in  cases  of  abnormality.  Figure  5  shows  an  example  kindly  provided  by  Pro¬ 
fessor  Brian  Worthington  of  Nottingham  University  of  a  38-week-old  fetus  in  a 
breech  presentation  in  the  womb.  The  baby  was  subsequently  safely  delivered  by 
Caesarian  section. 

The  subject  of  magnetic  resonance  imaging  has  made  great  strides  since  Lauterbur 
[8]  published  the  first  NMR  images  of  two  tubes  of  water  in  1973.  Now,  in  1987,  it 
can  be  said  that  MRI  has  become  an  accepted  modality  of  medical  imaging.  Approxi¬ 
mately  a  million  patients  have  now  been  examined.  Just  as  you  will  all  no  doubt  have 
had  many  x-ray  examinations,  it  will  not  be  long  before  you  will  all  have  had  the  ex¬ 
perience  of  an  MRI  examination,  and  magnetic  resonance  will  then  become  part  of  the 
everyday  language  of  the  man-in-the-street. 
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In  order  to  understand  the  molecular  mechanisms  underlying  the  function  of  Ca:  * 
channels  and  Ca2t -dependent  modulatory  proteins,  we  explore  the  characteristics  of 
Ca2+  binding  sites  in  molecules  for  which  the  structures  are  known  at  the  atomic 
level.  On  such  a  structural  basis,  theoretical  approaches  formulated  in  the  computa¬ 
tional  methods  of  quantum  chemistry  and  molecular  mechanics  [e.g.,  see  Refs.  1  and 
2)  can  be  used  to  study  the  structural  and  electronic  factors  that  determine  the  mecha¬ 
nisms  of  Ca2+  binding  and  its  consequences.  One  such  approach  is  coded  in  the 
CHARMM  package  of  computer  simulation  software,  which  can  be  used  to  perform 
both  energy  minimization  and  simulations  of  molecular  dynamics  for  very  large 
molecular  systems  such  as  the  calcium  binding  systems  [3], 

To  make  theoretical  analysis  of  the  Ca-binding  mechanism  possible,  a  first  step  in 
the  present  study  was  the  development  of  suitable  parameters  for  the  calculation  of 
interactions  of  Ca2+  with  peptides  and  proteins  with  the  CHARMM  programs.  Quan¬ 
tum  mechanical  methods  were  used  to  calculate  the  energies  of  the  complexes 
[Ca(OH2)4]2+  and  [Ca(OCH2)4]2+ ,  chosen  as  model  systems  for  the  interaction  of  Ca2 
with  coordinating  groups  in  peptides,  at  a  variety  of  Ca-0  distances.  The  parameters 
for  the  CHARMM  program  were  then  obtained  by  fitting  the  results  to  the  analytical 
expression  for  “nonbonded  interactions"  in  CHARMM  [3]. 

The  resulting  parametrization  was  tested  by  calculations  of  the  structure  and  of  the 
Ca2+-binding  properties  of  two  molecules  that  had  been  studied  with  experimental 
methods  including  x-ray  crystallography:  the  hexapeptide  cyclo-(Pro-Gly),  and  its 
2:1  complex  with  Ca2*  14],  and  the  75-residue  long  intestinal  calcium-binding 
protein  (ICaBP)  [5],  The  geometrical  parameters  calculated  for  the  hexapeptide 
(Fig.  la)  and  its  Ca2+ -containing  complex  (Fig.  lb,c)  were  found  to  be  in  good 
agreement  with  the  experimental  data  from  crystallography  and  from  nuclear  mag¬ 
netic  resonance  (nmr)  in  nonpolar  media  [4],  both  with  respect  to  the  conformation 
of  the  peptides  and  the  coordination  of  the  calcium.  Similarly,  calculations  of  ICaBP 
with  CHARMM  using  the  new  Ca2*  parameters  yielded  good  agreement  with  data 
from  x-ray  crystallography  (Fig.  2),  when  the  appropriate  constraints  were  applied. 
Briefly,  the  results  show  that  minimization  of  the  structure  without  the  inclusion  of 
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(a) 

Figure  1 .  (a)  Energy-optimized  structure  of  cyclo-(Pro-Gly),.  Note  the  hydrogen-bonding 

arrangement  of  (Gly)N-H  to  O-C(Pro)  which  supports  the  C,  symmetry  of  the  structure  ob¬ 
served  also  in  nonpolar  media  [4].  (b)  Energy-optimized  structure  of  the  complex  of  Ca2* 
with  two  molecules  of  cyclo-(Pro-Gly)3  positioned  above  and  below,  viewed  from  above. 
The  residues  are  numbered  01-06  in  one  ring,  and  07-12  in  the  second.  Ca2*  is  numbered 
13,  and  the  oxygens  nearest  to  the  calcium  are  identified  by  connecting  lines,  (c)  Side  view 
of  the  energy-optimized  structure  of  the  complex  shown  in  Fig.  lb.  See  legend  to  Fig¬ 
ure  lb.  for  details. 


the  Ca2+  maintains  the  general  tertiary  structure  of  the  ICaBP,  which  is  composed  of 
two  groups  of  helix-loop-helix  arrangements  linked  by  a  “linker  segment.”  However, 
the  calculations  of  the  structure  in  the  absence  of  calcium  reveal  significant  changes 
in  the  secondary  structures  of  the  two  Ca-binding  loops.  The  second  loop  (Loop  II). 
which  is  part  of  a  classical  “EF  hand”  defined  by  Kretzinger  [6, 7],  is  somewhat  less 
affected  than  Loop  I  which  does  not  seem  to  belong  to  an  EF  hand  arrangement  [7], 
When  energy  minimization  is  performed  with  the  Ca2+  in  each  of  the  two  loops,  the 
Ca  coordination  pattern  in  the  crystal  is  generally  well  reproduced  and  the  structures 
of  the  loops  are  restored,  but  the  linker  segment  is  distorted  in  comparison  to  the 
crystal  structure.  To  reproduce  the  geometry  observed  in  the  crystal  it  is  necessary 
to  include  in  the  calculations  those  water  molecules  that  are  observed  in  the  crystal  to 
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(b) 

Figure  1.  ( Continued ) 


be  nearest  to  atoms  in  the  ICaBP,  and  thus  likely  to  be  tightly  bound  to  the  protein. 
Inclusion  of  the  water  molecules  identified  to  have  close  contacts  with  atoms  belong¬ 
ing  to  residues  42,  44,  and  60  of  ICaBP  into  the  energy  minimization  calculation  (see 
Fig.  2),  restores  the  calculated  structure  to  the  observed  geometry.  Taken  together, 
the  results  of  energy  minimization  calculations  of  ICaBP  emphasize  the  importance 
of  the  tertiary  structure  in  maintaining  the  Ca-coordinating  secondary  structure  that  is 
required  for  calcium  binding. 

The  conclusions  regarding  the  role  of  the  tertiary  structure  in  the  binding  of  Ca2< , 
were  probed  by  an  analysis  of  the  Ca2+  binding  ability  of  some  proteins  predicted  to 
bind  the  ion  on  the  basis  of  sequence  homology  with  calmodulin  [8].  Using  the 
“mutation”  procedure  available  in  CHARMM,  the  sequence  of  Loop  II  in  ICaBP  was 
consecutively  mutated  to  that  of  each  of  the  four  loops  identified  in  a  yeast  gene 
product  shown  to  have  a  certain  degree  of  sequence  homology  with  the  four  loops  of 
calmodulin  [8].  The  stabilization  of  Ca2*  by  these  loops  was  calculated  from  the  en¬ 
ergy  differences  between  optimized  structures  of  the  Ca-containing  loop  within  the 
rigid  frame  of  ICaBP  (which  was  kept  fixed  during  the  optimization  of  the  loop  struc¬ 
ture)  and  the  energy  of  the  same  structure  optimized  in  a  similar  fashion  in  the 
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(c) 

Figure  1.  ( Continued ) 


absence  of  Ca2+.  The  stabilization  of  Ca2+  in  two  of  the  loops  is  found  to  be  suffi¬ 
cient  to  overcome  the  hydration  energy  of  the  ion,  whereas  for  the  other  two  loops 
the  difference  between  protein-binding  energy  and  hydration  is  too  small  to  permit  a 
definitive  conclusion.  For  comparison  purposes,  mutations  of  Loop  II  into  a  series  of 
sequences  constructed  randomly  were  also  analyzed  for  Ca2+ -binding  abilities.  The 
stabilization  of  Ca2+  in  the  random  loops  is  calculated  to  be  insufficient  to  overcome 
the  favorable  hydration  energy  of  the  ion;  these  loops  are  predicted  not  to  bind  Ca2+. 

Molecular  dynamics  simulations  of  the  loop  movements  at  high  temperatures  were 
used  to  explore  the  restrictions  in  the  conformational  space  of  the  loop  surrounded  by 
the  fixed  protein  structure  of  ICaBP.  Results  from  these  studies  revealed  the  role  of 
sequence  in  the  stabilization  of  Ca2+  in  the  binding  loops,  as  well  as  the  degree  to 
which  the  conformational  space  available  to  the  loop  is  constrained  by  the  rest  of  the 
protein  region  that  yields  a  structure  prepared  for  Ca2+  binding. 

Preliminary  results  from  molecular  dynamics  simulations  of  the  behavior  of  ICaBP 
at  very  low  and  very  high  temperatures,  support  the  conclusion  that  the  tertiary  struc¬ 
ture  of  the  helix-loop-helix  arrangement  in  the  EF  hand  of  the  Ca2* -binding  Loop  11 
restricts  the  conformational  space  of  the  loop  and  plays  a  major  role  in  the  binding  of 
the  ion. 
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Figure  2.  Energy-optimized  structure  of  ICaBP.  Only  the  alpha-carbon  chain  is  shown. 
The  sequence  is  numbered  at  every  fifth  residue.  Lettering  indicates  the  positions  of  the  two 
Ca2*  (Ca)  ions  bound  in  the  loops,  and  the  three  tightly  bound  waters  (W1.W2.W3)  that 
were  included  in  the  energy  minimization  calculations. 
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Introduction 

The  model  of  protobiological  events  that  has  been  presented  in  these  pages  (1]  has 
increasing  relevance  to  pharmacological  research.  The  thermal  proteins*  that  function 
as  key  substances  in  the  proteinoid  theory  have  recently  been  found  to  prolong  the 
survival  of  rat  forebrain  neurons  in  culture  [2]  and  to  stimulate  the  growth  of  neurites. 

A  search  for  such  activity  in  thermal  proteins  added  to  cultures  of  modern  neurons 
was  suggested  by  the  fact  that  some  of  the  microspheres  assembled  from  proteinoids 
rich  in  hydrophobic  amino  acids  themselves  generate  fibrous  outgrowths  [3,4]. 


Experimental 

Thermal  polycondensation  of  a-amino  acids  has  been  described  [5],  as  has  cultur¬ 
ing  of  rat  neurons  to  which  the  polymers  were  added  [6]. 

Results 

The  results  are  presented  in  Table  I.  A  comparison  of  these  results  with  those  from 
a  set  of  thermal  proteins  for  another  biological  activity  is  presented  in  Table  II. 

Discussion 

The  model  of  protobiological  development  was  made  possible  by  the  finding  that 
thermal  polymerization  of  aspartic  acid  [7]  could  be  extended  to  copolymerization. 
The  laboratory  conditions  are  geological  in  nature.  This  process  yielded  heteropoly¬ 
mers  [8, 9]  that  could  even  include  all  a-amino  acids,  nearly  all  of  which  would  how¬ 
ever  singly  fail  to  polymerize  or  would  decompose  when  heated. 

The  dicarboxylic  amino  acids  play  a  special  role  in  the  polycondensation.  Recent 
results  by  Luque-Romero  et.  al.  [10]  reconfirm  the  interpretation  that  aspartic  acid  fa- 


*The  term  thermal  protein  was  selected  by  Chemical  Abstracts  and  has  been  used  by  them  since  1972. 
Other  terms  for  such  polymers  are  proteinoids,  thermal  proteinoids,  and  thermal  copolyamino  acids. 
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Table  I.  Thermal  Proteins  Active  in  Neuronal  Survival  in  Culture 


Test  Material 

Cells/Dish 
(Relative  survival*) 

Controls 

0 

Poly(asp,glu,trp;l:l:l) 

60,000  ±  5500 

Poly(asp,glu ,  trp;  1:1:2) 

58,900  ±  6700 

Tryptophan 

0 

Asp.glu.trp  molar  mixture 

0 

PoIy(asp,giu,  leu;  1 : 1 : 1 ) 

19,900  ±  1700 

Poly(glu,trp;l:l) 

0 

Poly(trp) 

0 

Poly(asp,glu;l:l) 

0 

*4  days  from  200,000  cells 

Table  11.  Some  Thermal  Proteins  Exhibiting  Specific  Biological  Activity. 

Polymer 

Antiglyoxalase  I 

Neurotrophic 

Glu.trp  mixture! control) 

0 

0 

Copoly(asp,glu,trp) 

+ 

+ 

Copoly(asp,glu,leu) 

0 

+ 

Copoly(asp.glu.val) 

0 

+ 

Copoly  (glu.trp) 

+ 

0 

cilitates  thermal  polymerization  whereas  glutamic  acid  is  involved  in  the  ordering 
process,  starting  as  an  N  — *  C  polymerization  initiator  [II].  The  self-ordering  is 
strong  enough  [12, 13]  to  yield  thermal  proteins  limited  in  heterogeneity  to  a  degree 
comparable  to  that  in  unfractionated  proteins  in  modem  organisms  [14].  Limited  het¬ 
erogeneity  of  such  polymers  permits  the  practical  aspect  of  repeatable  biological 
function  [15]. 

In  discussion  of  the  evolutionary  position  of  thermal  copolymerization  of  amino 
acids,  Calvin  [16]  has  pointed  out  that  die  proposal  would  be  supported  if  evolution¬ 
ary  relics  of  the  primordial  self-ordering  could  be  found.  Working  from  the  whole 
protein  data  base  as  well  as  parts  of  it,  Ivanov  and  Fortsch  have  recently  reported  that 
in  the  majority  of  the  cases  they  have  studied  modem  proteins  are  found  to  contain 
such  evolutionary  relics  [17]. 

As  reported  here,  some  of  the  thermal  proteins  rich  in  hydrophobic  amino  acids 
such  as  tryptophan  or  leucine,  when  added  to  cultures  of  rat  forebrain  neurons,  pro¬ 
long  the  survival.  They  also  stimulate  outgrowths  of  dendrites  and  axons  |2].  The 
fact  that  model  polymers  for  primordial  proteins  have  such  effects  on  modem  neurons 
further  supports  the  interpretation  of  an  evolutionary  sequence  integrated  by  the  com¬ 
mon  chemistry  of  precellular  and  cellular  proteins  [11]  and  by  the  common  biofunc¬ 
tions  [15, 18]. 
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Pharmacological  potentialities  have  been  found  earlier  in  thermal  proteins 
{5, 19,20].  Examination  of  these  various  functions  reveal  specificities  for  thermal 
proteins  as  a  model  for  specificities  in  modem  proteins  [Table  11]. 
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